

68 


( {classif $9 nearlO packet$l) . ti . or 


USPAT; 


2004/01/16 






(classif$9 nearlO packet$l) . ab. ) and 


EPO; 


09:56 






(709/$.ccls. or 370/?. eels.) 


DERWENT; 










USOCR 




_ 


31 


<{(classif$9 nearlO packet$l) . ti . or 


USPAT; 


2004/01/16 






(classif$9 nearlO packet$l) . ab. ) and 


EPO; 


09:33 






(709/$. eels, or 370/$ . eels .) ) and 


DERWENT; 








@ad<19981228 


USOCR 




_ 


12 


6260072. URPN. 


USPAT 


2004/01/16 










09:31 


_ 


37 


("4556972" | "5121383" | "5253248" I 


USPAT 


2004/01/16 






"5313454" | "5377327" | "5412654" I 




09:32 






"5467345" I "5491801" | "5495479" | 










"5506847" | "5557607" I "5570346" | 










"5583861" I "5583862" I "5594734" I 










"5600630" | "5666360" I "5671222" i 










"5671445" | "5699361" I "5740164" I 










"5790536" | "5802278" I "5805816" i 










"5812526" 1 "5828844" I "5844887" | 










"5905712" I "5910942" 1 "5920566" 1 










"5920568" | "5930254" | "5940372" I 










"5963546" | "5970232" | "5996021" | 










"6016306") .PN. 






_ 


2 


5848233. pn. 


USPAT; 


2004/01/16 






EPO; 


09:55 








DERWENT; 










USOCR 




_ 


23806 


filter$3 same day 


USPAT; 


2004/01/16 








EPO; 


09:56 








DERWENT; 










USOCR 




- 


103 


(filter$3 same day) same (classif$9 same 


USPAT; 


2004/01/16 






filter$l) 


EPO; 


09:56 








DERWENT; 










USOCR 




_ 


3 


( (filter$3 same day) same (classif$9 same 


USPAT; 


2004/01/16 






filter$l)) and (709/$. eels . or 


EPO; 


11:04 






370/$.ccls. ) 


DERWENT; 










USOCR 






35 


tos near3 header 


USPAT; 


2004/01/16 








EPO; 


11: 10 








DERWENT; 










USOCR 




_ 


17 


(tos near3 header) and classif$8 


USPAT; 


2004/01/16 








EPO; 


11:11 








DERWENT; 










USOCR 




_ 


13 


(tos near3 header) and filter$3 


USPAT; 


2004/01/16 






EPO; 


11:11 








DERWENT; 










USOCR 






9 


((tos near3 header) and classif$8) and 


USPAT; 


2004/01/16 






((tos near3 header) and filter$3) 


EPO; 


11:11 








DERWENT; 










USOCR 




_ 


19 


(("6195697") or ("6385609") or 


USPAT; 


2004/01/20 






("6594786") or ("5905715") or ("6446200") 


US-PGPUB; 


14:26 






or ("5872928") or ("5974237") or 


EPO; JPO; 








("6259679") or (" 6405251" )). PN. 


DERWENT; 










IBM TDB 




_ 


3 


((("6195697") or ("6385609") or 


USPAT; 


2004/01/20 






("6594786") or ("5905715") or ("6446200") 


US-PGPUB; 


14:26 






or ("5872928") or ("5974237") or 


EPO; JPO; 








("6259679") or ("6405251") ). PN. ) and 


DERWENT; 








filter$3 


IBM TDB 






4804 


(cos or tos or qos) same (packet$l or 


USPAT; 


2004/01/21 






frame$l or datagram$l) 


EPO; 


10:21 








DERWENT; 










USOCR 
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- 


61142 


(classif$9 or filter$l) same {packet$l or 


USPAT; 


2004/01/21 






frame$l or datagram$l) 


EPO; 


10:21 








DERWENT; 










USOCR 




- 


366 


((cos or tos or qos} same (packet$l or 


USPAT; 


2004/01/21 






frame$l or datagram$l) ) same ( (classif$9 


EPO; 


10:21 






or filter$l) same (packet$l or frame$l or 


DERWENT ; 








datagram$l) ) 


USOCR 




- 


2038 


((370/229) or (370/230) or (370/230.1) or 


USPAT; 


2004/01/21 






(370/235) ) .CCLS. 


US-PGPUB; 


10:39 








EPO; JPO; 










DERWENT ; 










IBM TDB 




- 


22 


(((370/229) or (370/230) or (370/230.1) 


USPAT; 


2004/01/21 






or (370/235) ) .CCLS. ) and ( ((cos or tos 


US-PGPUB; 


12:07 






or qos) same (packet $1 or frame$l or 


EPO; JPO; 








datagram$l) ) same ( (classif$9 or 


DERWENT ; 








filter$l) same (packet$l or frame$l or 


IBM_TDB 








datagram$l) ) ) 






- 


8 


("5313455" 1 "5463620" I "5781532" I 


USPAT 


2004/01/21 






"6104700" | "6167027" I "6188698" I 




11:25 






"6222844" I " 6381649" ). PN. 






- 


330 


709/$.ccls. and (((370/229) or (370/230) 


USPAT; 


2004/01/21 






or (370/230.1) or (370/235 )). CCLS . ) 


US-PGPUB; 


11:01 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


10750 


709/22$. eels. 


USPAT; 


2004/01/21 








US-PGPUB; 


11:01 








EPO; JPO; 










DERWENT ; 










IBM TDB 




- 


143 


709/22$. eels, and (((370/229) or 


USPAT; 


2004/01/21 






(370/230) or (370/230.1) or 


US-PGPUB; 


11:01 






(370/235) } .CCLS, ) 


EPO; JPO; 










DERWENT ; 










IBM TDB 




- 


17 


(709/22$. eels, and {((370/229) or 


USPAT; 


2004/01/21 






(370/230) or (370/230.1) or 


US-PGPUB; 


11:09 






(370/235) ) .CCLS. ) ) and ( (classif $9 or 


EPO; JPO; 








filter$l) same (packet$l or frame$l or 


DERWENT; 








datagram$l) ) 


IBM TDB 




- 


6 


6412000. URPN. 


USPAT 


2004/01/21 










11:08 




10 


("5251152" | "5495426" | "5838919" | 


USPAT 


2004/01/21 






"5870561" | "5903559" | "5923849" | 




11:08 






"6028842" I "6046980" | "6137782" | 










"6209033") . PN. 






- 


1 


({cos or tos or qos) same (packet$l or 


USPAT; 


2004/01/21 






frame$l or datagram$l) ) and 


US-PGPUB; 


11:09 






( {709/22$. eels, and (((370/229) or 


EPO; JPO; 








(370/230) or (370/230.1) or 


DERWENT; 








(370/235) ) .CCLS. ) ) and ( (classif $9 or 


IBM_TDB 








filter$l) same (packet$l or frame$l or 










datagram$l) ) ) 






- 


27 


709/22$ .eels . and ( ({cos or tos or qos) 


USPAT; 


2004/01/21 






same (packet$l or frame$l or datagram$l)) 


US-PGPUB; 


11:09 






same ( (classif$9 or filter$l) same 


EPO; JPO; 








(packet$l or frame$l or datagram$l ) ) ) 


DERWENT; 










IBM TDB 




- 


3 


((cos or tos or qos) same (packet$l or 


USPAT; 


2004/01/21 






frame$l or datagram$l) ) and (("5313455" | 


US-PGPUB; 


11:25 






"5463620" | "5781532" | "6104700" | 


EPO; JPO; 








"6167027" | "6188698" I "6222844" I 


DERWENT; 








"6381649") . PN. ) 


IBM TDB 






3 


(((cos or tos or qos) same <packet$l or 


USPAT; 


2004/01/21 






frame$l or datagram$l) ) and (("5313455" | 


US-PGPUB; 


11:26 






"5463620" | "5781532" | "6104700" | 


EPO; JPO; 








"6167027" | "6188698" | "6222844" | 


DERWENT; 








"6381649") . PN. ) ) and (tos or cos or qos) 


IBM TDB 
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_ 


2 


( (classif $9 or filter$l) same (packet$l 


US PAT ; 


2004/01/21 






or frame$l or datagram$l ) } and ({((cos or 


US-PGPUB; 


11:27 






tos or qos) same (packet$l or frame$l or 


EPO; JPO; 








datagram$l}) and {("5313455" 1 "5463620" 


DERWENT; 








I "5781532" | "6104700" | "6167027" I 


IBM_TDB 








"6188698" | "6222844" | " 6381649" ). PN. ) ) 










and {tos or cos or qos) ) 






_ 


2 


( (709/22$. eels, and (((370/229) or 


US PAT; 


2004/01/21 






(370/230) or (370/230.1) or 


US-PGPUB; 


11:27 






(370/235) ) .CCLS. ) ) and ( (classif $9 or 


EPO; JPO; 








filter$l) same (packet$l or frame$l or 


DERWENT ; 








datagram$l) ) ) and (tos or qos or cos) 


IBM TDB 




_ 


122693 


filter$l nearS (delet$4 or expir$6 or 


USPAT; 


2004/01/21 






remov$3) 


US-PGPUB; 


12:09 








EPO; JPO; 










DERWENT ; 










IBM TDB 




_ 


148 


(filter$l nearS (delet$4 or expir$6 or 


USPAT; 


2004/01/21 






remov$3) ) and 709/22$ . eels . 


US-PGPUB; 


12:08 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


0 


((filter$l near5 (delet$4 or expir$6 or 


US PAT ; 


2004/01/21 






remov$3) ) and 709/22$ . eels . ) and ( ((cos 


US-PGPUB; 


12:08 






or tos or qos) same (packet$l or frame$l 


EPO; JPO; 








or datagram$l) ) same { (classif $9 or 


DERWENT; 








filter$l) same (packet$l or frame$l or 


IBM_TDB 








datagram$l) } ) 






_ 


7766 


(filter$l nearS (delet$4 or expir$6 or 


USPAT; 


2004/01/21 






remov$3) ) and { (classif$9 or filter$l) 


US-PGPUB; 


12:08 






same (packet$l or frame$l or datagram$l) } 


EPO; JPO; 










DERWENT ; 










IBM TDB 




- 


41 


( (filter$l nearS (delet$4 or expir$6 or 


USPAT; 


2004/01/21 






remov$3)) and ( (classif $9 or filter$l) 


US-PGPUB; 


12:09 






same (packet$l or frame$l or 


EPO; JPO; 








datagram$l} ) ) and ((cos or tos or qos} 


DERWENT; 








same (packet$l or frame$l or datagram$l) ) 


IBM TDB 




- 


754 


dynamic adj filter$l 


US PAT ; 


2004/01/21 








US-PGPUB; 


12:09 








EPO; JPO; 










DERWENT; 










IBM TDB 




_ 


1 


(dynamic adj filter$l) and (((370/229) or 


USPAT; 


2004/01/21 






(370/230) or (370/230.1) or 


US-PGPUB; 


12:09 






(370/235) ) .CCLS.) 


EPO; JPO; 










DERWENT; 










IBM TDB 




_ 


190 


( ({cos or tos or qos) same (packet$l or 


USPAT; 


2004/01/21 






frame$l or datagram$l) ) same ( (classif$9 


US-PGPUB; 


12:10 






or filter$l} same (packet$l or frame$l or 


EPO; JPO; 








datagram$l) } ) and (delet$4 or expir$6 or 


DERWENT; 








remov$3) 


IBM TDB 




- 


16 


(( {(cos or tos or qos) same (packet$l or 


USPAT; 


2004/01/21 






frame$l or datagram$l) ) same ( (classif$9 


US-PGPUB; 


13:21 






or filter$l} same (packet$l or frame$l or 


EPO; JPO; 








datagram$l) ) ) and (delet$4 or expir$6 or 


DERWENT; 








remov$3)) and (((370/229) or (370/230) or 


IBM_TDB 








(370/230.1) or ( 370/235 ) ) . CCLS . ) 






_ 


2 


({{(370/229) or (370/230) or (370/230.1) 


USPAT; 


2004/01/21 






or (370/235) ) .CCLS. ) and ( ((cos or tos 


US-PGPUB; 


13:21 






or qos) same (packet$l or frame? 1 or 


EPO; JPO; 








datagram$l) ) same ( (classif $9 or 


DERWENT; 








filter$l) same (packet$l or frame$l or 


IBM TDB 








datagram$l) ) ) ) and (timeout or {time adj 










out$l) ) 
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22 


((((370/229) or (370/230) or (370/230.1) 
or (370/235) ) .CCLS. ) and ( ((cos or tos 
or qos) same (packet $1 or frame$l or 
datagram$l) ) same ( (classif$9 or 
filter$l) same (packet$l or frame$l or 
datagram$l) ) ) ) and (expir$8 or delet$4 or 
remov$3 or time or (time adj out$l) or 
timeout$ 1 ) 


USPAT; 
US-PGPUB; 
EPO; JPO; 
DERWENT; 
IBM TDB 


2004/01/21 
13:22 
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I 

I 



L 

Number 


Hits 


Search Text 


DB 


Time stamp 




2 


bandwidth adj broker$3 


USPAT; 


2001/09/22 








EPO; 


18:52 








DERWENT; 










USOCR 






22 


bandwidth same broker$3 


USPAT; 


2001/09/22 








EPO; 


18:56 








DERWENT; 










USOCR 




_ 


11936 


service near20 level$l 


USPAT; 


2001/09/22 








EPO; 


19:05 








DERWENT; 










USOCR 






3561 


tos or qos 


USPAT; 


2001/09/22 








EPO; 


19:07 








DERWENT; 










USOCR 




- 


342 


(service near20 level$l) and (tos or qos) 


US PAT; 


2001/09/22 








EPO; 


19:06 








DERWENT; 










USOCR 




- 


1 


(bandwidth same broker$3) and ( (service 


USPAT; 


2001/09/22 






near20 level$l) and (tos or qos)) 


EPO; 


19:06 








DERWENT; 










USOCR 




- 


2441 


tos 


US PAT ; 


2001/09/22 








EPO; 


19:07 








DERWENT; 










USOCR 




- 


4 


tos same (service near20 level$l) 


USPAT; 


2001/09/22 








EPO; 


19:09 








DERWENT; 










USOCR 




- 


304 


(service near20 level$l) same filter$3 


USPAT; 


2001/09/22 








EPO; 


19: 10 








DERWENT ; 










USOCR 




- 


0 


((service near20 level$l) same filter$3) 


USPAT; 


2001/09/22 






and tos 


EPO; 


19: 10 








DERWENT; 










USOCR 




- 


261 


bandwidth$l and broker$3 


USPAT; 


2001/09/22 








EPO; 


19:10 








DERWENT; 










USOCR 




- 


3 


((service near20 level$l) same filter$3) 


USPAT; 


2001/09/22 






and (bandwidth$l and broker$3) 


EPO; 


19:11 








DERWENT; 










USOCR 




- 


4909 


service near3 level$l 


USPAT; 


2001/09/22 








EPO; 


19:11 








DERWENT; 










USOCR 




_ 


9939 


admission same control$4 


USPAT; 


2001/09/22 








EPO; 


19: 12 








DERWENT; 










USOCR 




- 


19 


(service near3 level$l) same (admission 


USPAT; 


2001/09/22 






same control$4) 


EPO; 


19:21 








DERWENT; 










USOCR 




- 


9448 


ingress and egress 


USPAT; 


2001/09/22 








EPO; 


19:21 








DERWENT ; 










USOCR 






3585 


bandwidth adj (allocat$4 or broker$5 or 


USPAT; 


2001/09/22 






control$4 or request$3) 


EPO; 


19:22 








DERWENT; 










USOCR 
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- 


57 


{ingress and egress) and (bandwidth adj 


US PAT; 


2001/09/22 






(allocat$4 or broker$5 or control$4 or 


EPO; 


19:22 






request$3} ) 


DERWENT; 










USOCR 




— 


13 


{service near3 level$l) and ( (ingress and 


USPAT; 


2001/09/22 






egress) and {bandwidth adj (allocat$4 or 


EPO; 


19:26 






broker$5 or control$4 or request$3))) 


DERWENT ; 










USOCR 




- 


168 


{admission$l or access) adj profil$3 


USPAT; 


2001/09/22 








EPO; 


19:27 








DERWENT ; 










USOCR 




- 


1190 


{(709/226) or (709/225) or 


USPAT; 


2001/09/22 






(709/229) } .CCLS. 


EPO; 


19:27 








DERWENT; 










USOCR 




- 


7 


{ (admission$ 1 or access) adj profil$3) 


USPAT; 


2001/09/22 






and {(("709/226") or ("709/225") or 


EPO; 


19:27 






("709/229") ) .CCLS. ) 


DERWENT ; 










USOCR 




- 


59 


policy adj server$l 


USPAT; 


2001/09/25 








EPO; 


12:04 








DERWENT; 










USOCR 




- 


675 


713/15$. eels. 


USPAT; 


2001/09/25 








EPO; 


12:04 








DERWENT; 










USOCR 




- 


6 


(policy adj server$l) and 713/15$ . eels . 


USPAT; 


2001/09/25 








EPO; 


12:07 








DERWENT ; 










USOCR 




- 


1 


admission$l adj profile$l 


USPAT; 


2001/09/25 








EPO; 


12:29 








DERWENT; 










USOCR 




- 


4276 


dynamic$5 nearS filter$3 


USPAT; 


2001/09/25 








EPO; 


12:09 








DERWENT; 










USOCR 




- 


2 


(dynamic$5 near5 filter$3) and (policy 


USPAT; 


2001/09/25 






adj server$l) 


EPO; 


12:29 








DERWENT; 










USOCR 




- 


306 


filter$3 adj criteria$l 


USPAT; 


2001/09/25 








EPO; 


12:30 








DERWENT; 










USOCR 




- 


39 


admission$l adj 10 profile$l 


USPAT; 


2001/09/25 








EPO; 


12:29 








DERWENT; 










USOCR 




- 


0 


(filter$3 adj criteria$l) and 


USPAT; 


2001/09/25 






(admission$ 1 adj 10 profile$l) 


EPO; 


12:29 








DERWENT; 










USOCR 




- 


17 


(dynamic$5 near5 filter$3) and (filter$3 


USPAT; 


2001/09/25 






adj criteria$l) 


EPO; 


12:35 








DERWENT ; 










USOCR 




- 


47 


ingress near4 profil$3 


USPAT; 


2001/09/25 








EPO; 


12:35 








DERWENT; 










USOCR 






0 


(admission$l adj 10 profile$l) and 


USPAT; 


2001/09/25 






(ingress near4 profil$3) 


EPO; 


12:35 








DERWENT; 










USOCR 
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_ 


0 


(ingress near4 profil$3) and 


US PAT ; 


2001/09/25 






713/15$. eels. 


EPO; 


12:35 








DERWENT; 










USOCR 




_ 


3 


(ingress near4 profil$3) and filter$3 


USPAT; 


2001/09/25 








EPO; 


12:38 








DERWENT; 










USOCR 




- 


102511 


remov$3 nearlO filter$l 


USPAT; 


2001/09/25 








EPO; 


12:40 








DERWENT; 










USOCR 




- 


84571 


remov$3 nearS filter$l 


USPAT; 


2001/09/25 








EPO; 


12:42 








DERWENT; 










USOCR 




- 


0 


(admission$l adjlO profile$l) and 


USPAT; 


2001/09/25 






(remov$3 nearS filter$l) 


EPO; 


12:42 








DERWENT; 










USOCR 




- 


2 


(policy adj server$l) and (remov$3 near5 


USPAT; 


2001/09/25 






filter$l) 


EPO; 


12:43 








DERWENT ; 










USOCR 




- 


4 


713/15$ .eels . and (remov$3 nearS 


USPAT; 


2001/09/25 






filter$l) 


EPO; 


12:43 








DERWENT ; 










USOCR 




- 


867 


filter$3 nearS (delet$3 or expir$6) 


USPAT; 


2001/09/26 








EPO; 


10: 48 








DERWENT; 










USOCR 




- 


8494 


713/$.ccls. 


USPAT; 


2001/09/26 








EPO; 


10: 48 








DERWENT; 










USOCR 




- 


0 


(dynamically with (creat$3 or establish$3 


USPAT; 


2001/09/26 






or remov$3 or delet$3) with filter$l) and 


EPO; 


10: 48 






713/$. eels. 


DERWENT; 










USOCR 




- 


20 


(filter$3 near5 (delet$3 or expir$6) ) and 


USPAT; 


2001/09/26 






713/$. eels . 


EPO; 


12:24 








DERWENT; 










USOCR 




- 


89 


dynamically with (creat$3 or establish$3 


USPAT; 


2001/09/26 






or remov$3 or delet$3) with filter$l 


EPO; 


10:53 








DERWENT ; 










USOCR 




- 


9 


dynamically near3 (creat$3 or establish$3 


USPAT; 


2001/09/26 






or remov$3 or delet$3) near3 filter$l 


EPO; 


11:45 








DERWENT; 










USOCR 




- 


2 


("6148336") .PN. 


USPAT; 


2001/09/26 








EPO; 


11: 45 








DERWENT; 










USOCR 




- 


1 


( ("6148336") .PN. ) and (remov$3 or delet$3 


USPAT; 


2001/09/26 






or expir$7) 


EPO; 


12:05 








DERWENT; 










USOCR 




- 


1 


(( ("6148336") .PN.) and (remov$3 or 


USPAT; 


2001/09/26 






delet$3 or expir$7) ) and packet$l 


EPO; 


12:06 








DERWENT; 










USOCR 






1 


{( ("6148336") .PN. ) and (remov$3 or 


USPAT; 


2001/09/26 






delet$3 or expir$7) ) and (destination$l 


EPO; 


12:09 






or source$l) 


DERWENT; 










USOCR 
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- 


0 


( ("6148336") .PN.} and {admission ad j 


USPAT; 


2001/09/26 






prof ile$l) 


EPO; 


12:10 








DERWENT; 










USOCR 




- 


0 


( {"6148336") .PN. ) and {admission near4 


USPAT; 


2001/09/26 






prof ile$l) 


EPO; 


12:10 








DERWENT; 










USOCR 




- 


1 


( {"6148336") .PN. } and polic$3 


USPAT; 


2001/09/26 






EPO; 


12:25 








DERWENT; 










USOCR 




- 


1 


( {"6148336") .PN. ) and {policy or 


USPAT; 


2001/09/26 






policies) 


EPO; 


12:25 








DERWENT; 










USOCR 




- 


26 


({"5554322") or ("5884033") or 


USPAT; 


2001/09/29 






("5953338") or ("5968176") or ("6055571") 


EPO; 


18:39 






or {"6130924") or ("6148336") or 


DERWENT; 








("6167451") or ("6167445") or ("6178505") 


USOCR 








or {"6199113") or ("6256741") or 










("6262974") or ("6295527") ) .PN. 






- 


3 


("5544322") .PN. 


USPAT; 


2001/09/29 








EPO; 


18:39 








DERWENT ; 










USOCR 




- 


4260 


relational adj database 


USPAT; 


2002/04/11 








EPO; 


13:05 








DERWENT; 










USOCR 




- 


11775 


707/$.ccls. 


USPAT; 


2002/04/11 








EPO; 


13:05 








DERWENT; 










USOCR 




- 


1757 


(relational adj database) and 707/$. eels . 


USPAT 


2002/04/11 








13:06 


- 


296 


( (relational adj database) and 


USPAT 


2002/04/11 






707/$.ccls.) and api 




13:06 


- 


1778 


trigger$3 nearS filter$l 


USPAT 


2002/04/18 










11:26 


- 


15 


(trigger$3 nearS filter$l) and 


USPAT 


2002/04/18 






713/$.ccls. 




11:26 


- 


1 


("6178505") .PN. 


USPAT 


2002/04/22 










08:29 


- 


1 


( ("6178505") .PN.) and expir$7 


USPAT 


2002/04/22 










08:29 


- 


163 


tos same filter$3 


USPAT 


2002/11/06 










15:43 


- 


6 


(tos same filter$3) same {type adj2 


USPAT 


2002/11/06 






service$l) 




15:45 


- 


4611 


classif$8 same filter$3 


USPAT 


2002/11/06 










15:45 


- 


5 


(classif$8 same filter$3) same tos 


USPAT 


2002/11/06 










15:46 


- 


14 


(classif$8 same filter$3) and tos 


USPAT 


2002/11/06 










15:46 


- 


52 


packet adj classifier$l 


USPAT; 


2002/11/14 








EPO; 


16:05 








DERWENT; 










USOCR 




- 


26 


(packet adj classif ier$l) and filter$3 


USPAT; 


2002/11/14 








EPO; 


16:05 








DERWENT; 










USOCR 






21 


{packet adj classif ier$l) and (qos or 


USPAT; 


2002/11/14 






tos) 


EPO; 


16:05 








DERWENT; 










USOCR 
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_ 


15 


((packet adj classif ier$l ) and filter$3) 


USPAT; 


2002/11/14 






and ((packet adj classif ier$l ) and (qos 


EPO; 


16: 14 






or tos) ) 


DERWENT; 










USOCR 




_ 


4 


("5636371" | "5729685" ( "5802286" | 


USPAT 


2002/11/14 






"5826014") .PN. 




16: 13 


- 


68 


witting, inv. 


USPAT; 


2002/11/14 






EPO; 


16: 14 








DERWENT; 










USOCR 




- 


2. 


witting. inv. and filter$3 


USPAT; 


2002/11/14 








EPO; 


16: 15 








DERWENT; 










USOCR 




- 


1 


witting. inv. and packet$l 


USPAT; 


2002/11/14 








EPO; 


16: 15 








DERWENT; 










USOCR 




- 


253 


barzilai 


USPAT; 


2002/11/14 








EPO; 


16:15 








DERWENT; 










USOCR 




- 


17 


barzilai . inv. 


USPAT; 


2002/11/14 








EPO; 


16: 16 








DERWENT; 










USOCR 






77 


barzilai and packet$l 


USPAT; 


2002/11/14 








EPO; 


16: 16 








DERWENT; 










USOCR 




- 


9 


(barzilai and packet$l) and classif$9 


USPAT; 


2002/11/14 








EPO; 


16:18 








DERWENT; 










USOCR 




- 


16 


"5040176" 


USPAT; 


2002/11/14 








EPO; 


16: 18 








DERWENT ; 










USOCR 




- 


2 


5040176. pn. 


USPAT; 


2002/11/14 








EPO; 


16:20 








DERWENT; 










USOCR 




- 


5 


tsipora and barzilai . inv. 


USPAT; 


2002/11/14 








EPO; 


16:20 








DERWENT; 










USOCR 




- 


0 


(tsipora and barzilai . inv, ) and filter$3 


USPAT; 


2002/11/14 








EPO; 


16:21 








DERWENT ; 










USOCR 




- 


0 


(tsipora and barzilai . inv. ) and qos 


USPAT; 


2002/11/14 








EPO; 


16:21 








DERWENT; 










USOCR 




- 


1 


(tsipora and barzilai . inv. ) and 


USPAT; 


2002/11/14 






classification 


EPO; 


16:24 








DERWENT ; 










USOCR 




- 


13 


wittig and hartmut 


USPAT; 


2002/11/14 








EPO; 


16:29 








DERWENT; 










USOCR 




- 


3 


(("6084879") or ("6335935") or 


USPAT 


2002/11/14 






("6157955") ) . PN. 




16:31 




522 


packet adj classif$8 


USPAT; 


2003/04/30 








US-PGPUB; 


11: 12 








EPO; JPO; 










DERWENT ; 










IBM TDB 
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_ 


287 


(nic or {network adj (interface or 


USPAT; 


2003/04/30 






adapter) ) ) nearS filter$3 


US-PGPUB; 


11:13 








EPO; JPO; 










DERWENT; 










IBM TDB 




_ 


60477 


370/$.ccls. 


USPAT; 


2003/04/30 








US-PGPUB; 


11:09 








EPO; JPO; 










DERWENT; 










IBM TDB 




_ 


74 


370/$. eels . and ({nic or (network adj 


USPAT; 


2003/04/30 






(interface or adapter) ) ) nearS filter$3) 


US-PGPUB; 


11:10 








EPO; JPO; 










DERWENT; 










IBM TDB 




_ 


8 


(370/$. eels, and ((nic or {network adj 


USPAT; 


2003/04/30 






(interface or adapter))) nearS filter$3) ) 


US-PGPUB; 


11: 10 






and classif$8 


EPO; JPO; 










DERWENT; 










IBM TDB 




_ 


13 


(packet adj classif$8) and ((nic or 


USPAT; 


2003/04/30 






(network adj {interface or adapter) ) ) 


US-PGPUB; 


11:10 






nearS filter$3) 


EPO; JPO; 










DERWENT; 










IBM TDB 




_ 


1303 


(nic or {network adj (interface or 


USPAT; 


2003/04/30 






adapter or controller) ) ) same f ilter$3 


US-PGPUB; 


11:13 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


1511 


(frame or packet or cell) adj classif$8 


USPAT; 


2003/04/30 








US-PGPUB; 


11:13 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


311 


{nic or (network adj 2 (interface or 


USPAT; 


2003/04/30 






adapter))) nearS filter$3 


US-PGPUB; 


11:13 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


1466 


(nic or (network adj 2 (interface or 


USPAT; 


2003/04/30 






adapter or controller) ) ) same f ilter$3 


US-PGPUB; 


11:14 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


15 


((nic or {network adj2 (interface or 


USPAT; 


2003/04/30 






adapter))) nearS filter$3) and ((frame or 


US-PGPUB; 


11:16 






packet or cell) adj classif$8) 


EPO; JPO; 










DERWENT ; 










IBM TDB 




- 


7 


("5774660" | "5918021" I "6151297" I 


USPAT 


2003/04/30 






"6160544" | "6243360" I "6253334" I 




11:15 






"6314525") .PN. 






- 


4 


{("5774660" | "5918021" I "6151297" I 


USPAT; 


2003/04/30 






"6160544" | "6243360" I "6253334" I 


US-PGPUB; 


11:18 






" 6314525" ) . PN . ) andfilter$3 


EPO; JPO; 










DERWENT; 










IBM TDB 




- 


25 


({nic or (network ad j 2 (interface or 


USPAT; 


2003/04/30 






adapter or controller) ) ) same filter$3) 


US-PGPUB; 


12:43 






and ( (frame or packet or cell) adj 


EPO; JPO; 








classif $8) 


DERWENT ; 










IBM TDB 






25 


{(nic or (network adj2 {interface or 


USPAT; 


2003/04/30 






adapter or controller))) same filter$3) 


US-PGPUB; 


12:44 






and ({frame or packet or cell) adj 


EPO; JPO; 








classif $8) 


DERWENT; 










IBM TDB 
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_ 


74 


( (nic or {network adj2 (interface or 


USPAT; 


2003/04/30 






adapter or controller))} same filter$3) 


US-PGPUB; 


12:46 






and ( (frame or packet or cell) same 


EPO; JPO; 








classif $8) 


DERWENT; 










IBM TDB 




- 


22 


( ( (nic or (network adj2 (interface or 


USPAT; 


2003/04/30 






adapter or controller))) same filter$3) 


US-PGPUB; 


19:20 






and ((frame or packet or cell) same 


EPO; JPO; 








classif$8) ) and @ad<19981201 


DERWENT; 










IBM TDB 




- 


18 


vaughn. xa. and william and rinehart.xp. 


USPAT; 


2003/04/30 






EPO; 


19: 15 








DERWENT; 










USOCR 




_ 


2 


6389479. pn. 


USPAT; 


2003/04/30 






EPO; 


19: 15 








DERWENT; 










USOCR 




_ 


204094 


6389479. pn. and filter$3 or classif$8 


USPAT; 


2003/04/30 






EPO; 


19: 16 








DERWENT; 










USOCR 




- 


1 


6389479. pn. and filter$3 and classif$8 


USPAT; 


2003/04/30 








EPO; 


19: 19 








DERWENT; 










USOCR 




- 


1 


6078957. pn. and filter$3 


USPAT; 


2003/04/30 








EPO; 


19:20 








DERWENT; 










USOCR 




_ 


22 


( ( (nic or (network adj2 (interface or 


USPAT; 


2003/04/30 






adapter or controller) ) ) same filter$3) 


US-PGPUB; 


19:27 






and ( (frame or packet or cell) same 


EPO; JPO; 








classif$8) ) and @ad<19981201 


DERWENT; 










IBM TDB 




- 


1466 


{(nic or (network adj2 (interface or 


USPAT; 


2003/04/30 






adapter or controller))) same filter$3) 


US-PGPUB; 


19:27 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


8260 


709/22$. eels. 


USPAT; 


2003/04/30 








US-PGPUB; 


19:27 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


132 


709/22$ . eels . and (({nic or (network adj2 


USPAT; 


2003/04/30 






(interface or adapter or controller) ) ) 


US-PGPUB; 


19:27 






same f ilter$3) ) 


EPO; JPO; 










DERWENT; 










IBM TDB 




- 


60477 


370/$ .ecls. 


USPAT; 


2003/04/30 








US-PGPUB; 


19:28 








EPO; JPO; 










DERWENT; 










IBM TDB 




_ 


944 


370/$.ccls. and 709/22$ . eels . 


USPAT; 


2003/04/30 








US-PGPUB; 


19:28 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


27 


(370/$. eels, and 709/22$ . eels . ) and 


USPAT; 


2003/04/30 






(709/22$ .ecls. and (((nic or {network 


US-PGPUB; 


19:32 






adj2 (interface or adapter or 


EPO; JPO; 








controller))) same filter$3)}) 


DERWENT; 










IBM TDB 






36 


(packet adj classif$8) and nic 


USPAT; 


2003/04/30 








US-PGPUB; 


19:33 








EPO; JPO; 










DERWENT; 










IBM TDB 
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- 


4 


((packet adj classif?8) and nic) and 


USPAT; 


2003/04/30 






@ad<19981101 


US-PGPUB; 


19:34 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


2 


(({packet adj classif$8) and nic) and 


USPAT; 


2003/04/30 






@ad<19981101) and filter$3 


US-PGPUB; 


20:41 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


10 


(policy adj server) same filter$l 


USPAT; 


2003/04/30 








US-PGPUB; 


20:44 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


4 


vaughn. xa. and profile and william 


USPAT; 


2003/04/30 








US-PGPUB; 


20:52 








EPO; JPO; 










DERWENT ; 










IBM TDB 




- 


1 


6510164. pn. and filter$3 


USPAT; 


2003/04/30 








US-PGPUB; 


20:52 








EPO; JPO; 










DERWENT; 










IBM TDB 




- 


6943 


classif$9 same filter$l 


USPAT; 


2004/01/16 








EPO; 


08:57 








DERWENT; 










USOCR 




- 


41 


barzilai . inv. 


USPAT; 


2004/01/16 








EPO; 


08:57 








DERWENT; 










USOCR 




- 


1 


(classif$9 same filter$l) and 


USPAT; 


2004/01/16 






barzilai . inv. 


EPO; 


08:57 








DERWENT; 










USOCR 




- 


420945 


(classif$9 same filter$l) samd dynamic$4 


USPAT; 


2004/01/16 








EPO; 


08:57 








DERWENT; 










USOCR 




- 


152 


(classif$9 same filter$l) same dynamic$4 


USPAT; 


2004/01/16 








EPO; 


08:58 








DERWENT; 










USOCR 




- 


432264 


filter$3 same (expir$8 or delet$4 or 


USPAT; 


2004/01/16 






remov$4 or creat$4) 


EPO; 


08:58 








DERWENT; 










USOCR 




- 


11494 


classif$9 same {packet$l or message$l or 


USPAT; 


2004/01/16 






frame$l or datagram$l) 


EPO; 


08:59 








DERWENT; 










USOCR 




- 


12709 


admission same (polic$3 or regulat$4 or 


USPAT; 


2004/01/16 






rule$l or criteria) 


EPO; 


09:00 








DERWENT; 










USOCR 




- 


70379 


{traffic or packet$l or frame$l or 


USPAT; 


2004/01/16 






datagram$l) same (filter$l or shaping) 


EPO; 


09:06 








DERWENT; 










USOCR 




- 


15 


({traffic or packet$l or frame$l or 


USPAT; 


2004/01/16 






datagram$l) same (filter$l or shaping) ) 


EPO; 


09:02 






and (admission same (polic$3 or regulat$4 


DERWENT; 








or rule$l or criteria) ) and (classif$9 


USOCR 








same (packet$l or message$l or frame$l or 










datagram$l)) and {filter$3 same (expir$8 










or delet$4 or remov$4 or creat$4)} and 










(classif$9 same filter$l) 
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- 


1093 


terrell . inv. 


USPAT; 


2004/01/16 








EPO; 


09:02 








DERWENT; 










USOCR 




_ 


2 


terrell.inv. and (nortel.asn. or (bay adj 


USPAT; 


2004/01/16 






network$l .asn. ) ) 


EPO; 


09:03 








DERWENT; 










USOCR 




_ 


5748 


(nortel.asn. or (bay adj network$l . asn. ) ) 


USPAT; 


2004/01/16 








EPO; 


09:03 








DERWENT; 










USOCR 




- 


10 


((nortel.asn. or (bay adj 


USPAT; 


2004/01/16 






network$l . asn. ) ) ) and (classif$9 same 


EPO; 


09:04 






filter$l) 


DERWENT; 










USOCR 




- 


8 


("5313455" | "5463620" I "5781532" I 


USPAT 


2004/01/16 






"6104700" | "6167027" | "6188698" I 




09:05 






"6222844" I "6381649") . PN. 






_ 


581148 


(traffic or packet$l or frame$l or 


USPAT; 


2004/01/16 






datagram$l or admission) same (filter$l 


EPO; 


09:09 






or shaping or control$4 or policy) 


DERWENT; 










USOCR 




- 


7 


("5546390" | "5873078*' | "6029170" I 


USPAT 


2004/01/16 






"6223174" I "6233574" | "6298340" | 




09:07 






"6396842") -PN. 






_ 


11 


("5243596" I "5406322" I "5509006" | 


USPAT 


2004/01/16 






"5721920" | "5862335" 1 "5890217" I 




09:07 






"6104696" | "6154446" I "6157955" 1 










"6167047" | "6240452") .PN. 






_ 


7 


6279035. URPN. 


USPAT 


2004/01/16 










09:08 




7 


("5600820" | "5828844" | "5878043" | 


USPAT 


2004/01/16 






"5892924" I "5920705" I "5926459" 1 




09:09 






"5949786") . PN. 






_ 


0 


{("5600820" | "5828844" | "5878043" | 


USPAT; 


2004/01/16 






"5892924" | "5920705" | "5926459" I 


EPO; 


09:09 






"5949786") . PN. ) and (classif$9 same 


DERWENT; 








filter$l) 


USOCR 




- 


3 


(("5600820" | "5828844" I "5878043" I 


USPAT; 


2004/01/16 






"5892924" I "5920705" ! "5926459" I 


EPO; 


09:10 






"5 94 9786" ) . PN. ) and (classif$9 same 


DERWENT; 








(packet$l or message$l or frame$l or 


USOCR 








datagram$l) ) 






- 


2 


(({"5600820" | "5828844" | "5878043" | 


USPAT; 


2004/01/16 






"5892924" I "5920705" I "5926459" I 


EPO; 


09: 11 






"5949786") .PN. ) and (classif$9 same 


DERWENT; 








(packet$l or message$l or frame$l or 


USOCR 








datagram$l) ) ) and filter$3 






- 


759 


(filter$3 same (expir$8 or delet$4 or 


USPAT; 


2004/01/16 






remov$4 or creat$4)) and (classif$9 same 


EPO; 


09:11 






filter$l) and ((traffic or packet$l or 


DERWENT; 








frame$l or datagram$l or admission) same 


USOCR 








(filter$l or shaping or control$4 or 










policy) ) 






- 


30 


( (filter$3 same {expir$8 or delet$4 or 


USPAT; 


2004/01/16 






remov$4 or creat$4)) and (classif$9 same 


EPO; 


09:16 






filter$l) and { (traffic or packet$l or 


DERWENT; 








frame$l or datagram$l or admission) same 


USOCR 








(filter$l or shaping or control$4 or 










policy) ) ) and (admission same (polic$3 or 










regulat$4 or rule$l or criteria)) 






- 


34 


6167445. URPN. 


USPAT 


2004/01/16 










09: 15 


- 


13 


6031841. URPN. 


USPAT 


2004/01/16 










09: 16 




152 


( (classif$9 same filter$l) same 


USPAT; 


2004/01/16 






dynamic$4) and (classif$9 same filter$l) 


EPO; 


09:16 








DERWENT; 










USOCR 
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- 


46 
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(57) ABSTRACT 

A flexible, policy-based, mechanism for managing, 
monitoring, and prioritizing traffic within a network and 
allocating bandwidth to achieve true quality of service 
(QoS) is provided. According to one aspect of the present 
invention, a method is provided for managing bandwidth 
allocation in a network that employs a non-deterministic 
access protocol, such as an Ethernet network. A packet 
forwarding device receives information indicative of a set of 
traffic groups, such as: a MAC address, or IEEE 802. lp 
priority indicator or 802. 1Q frame tag, if the QoS policy is 
based upon individual station applications; or a physical port 
if the QoS policy is based purely upon topology. The packet 
forwarding device additionally receives bandwidth param- 
eters corresponding to the traffic groups. After receiving a 
packet associated with one of the traffic groups on a first 
port, the packet forwarding device schedules the packet for 
transmission from a second port based upon bandwidth 
parameters corresponding to the traffic group with which the 
packet is associated. According to another aspect of the 
present invention, a method is provided for managing band- 
width allocation in a packet forwarding device. The packet 
forwarding device receives information indicative of a set of 
traffic groups. The packet forwarding device additionally 
receives information defining a QoS policy for the traffic 
groups. After a packet is received by the packet forwarding 
device, a traffic group with which the packet is associated is 
identified. Subsequently, rather than relying on an end-to- 
end signaling protocol for scheduling, the packet is sched- 
uled for transmission based upon the QoS policy for the 
identified traffic group. 

28 Claims, 5 Drawing Sheets 
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POLICY BASED QUALITY OF SERVICE 

This is a continuation of application Ser. No. 09/018,103, 
filed on Feb. 3, 1998, now issued U.S. Pat. No. 6,104,700. 

This application claims the benefit of U.S. Provisional 5 
Application No. 60/057,371, filed Aug. 29, 1997. 

COPYRIGHT NOTICE 

Contained herein is material that is subject to copyright 1Q 
protection. The copyright owner has no objection to the 
facsimile reproduction of the patent disclosure by any per- 
son as it appears in the Patent and Trademark Office patent 
files or records, but otherwise reserves all rights to the 
copyright whatsoever. 15 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The invention relates generally to the field of computer 
networking devices. More particularly, the invention relates 20 
to a flexible, policy-based mechanism for managing, 
monitoring, and prioritizing traffic within a network and 
allocating bandwidth to achieve true Quality of Service 
(QoS). 

2. Description of the Related Art 25 
Network traffic today is more diverse and bandwidth- 
intensive than ever before. Today's intranets are expected to 
support interactive multimedia, full-motion video, rich 
graphic images and digital photography. Expectations about 3Q 
the quality and timely presentation of information received 
from networks is higher than ever. Increased network speed 
and bandwidth alone will not satisfy the high demands of 
today's intranets. 

The Internet Engineering Task Force (IETF) is working 35 
on a draft standard for the Resource Reservation Protocol 
(RSVP), an Internet Protocol-(IP) based protocol that allows 
end-stations, such as desktop computers, to request and 
reserve resources within and across networks. Essentially, 
RSVP is an end-to-end protocol that defines a means of ^ 
communicating the desired Quality of Service between 
routers. RSVP is receiver initiated. The end-station that is 
receiving the data stream communicates its requirements to 
an adjacent router and those requirements are passed back to 
all intervening routers between the receiving end-station and 45 
the source of the data stream and finally to the source of the 
data stream itself. Therefore, it should be apparent that 
RSVP must be implemented across the whole network. That 
is, both end-stations (e.g., the source and destination of the 
data stream) and every router in between should be RSVP 50 
compliant in order to accommodate the receiving end- 
station's request. 

While RSVP allows applications to obtain some degree of 
guaranteed performance, it is a first-come, first-served 
protocol, which means if there are no other controls within ss 
the network, an application using RSVP may reserve and 
consume resources that could be needed or more effectively 
utilized by some other mission-critical application. A further 
limitation of this approach to resource allocation is the fact 
that end-stations and routers must be altered to be RSVP eo 
compliant. Finally, RSVP lacks adequate policy mechanisms 
for allowing differentiation between various traffic flows. It 
should be appreciated that without a policy system in place, 
the network manager loses control. 

Recent attempts to facilitate traffic differentiation and 65 
prioritization include draft standards specified by the Insti- 
tute of Electrical and Electronics Engineers (IEEE). The 
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IEEE 802.1 Q draft standard provides a packet format for an 
application to specify which Virtual Local Area Network 
(VLAN) a packet belongs to and the priority of the packet. 
The IEEE 802.1p committee provides a guideline to classify 
traffic based on a priority indicator in an 802.1Q frame tag. 
This allows VLANs to be grouped into eight different traffic 
classes or priorities. The IEEE 802.1p committee does not, 
however, define the mechanism to service these traffic 
classes. 

What is needed is a way to provide true Quality of Service 
("QoS") in a network employing a non-deterministic access 
protocol, such as an Ethernet network, that not only has the 
ability to prioritize and service different traffic classes, but 
additionally provides bandwidth management and guaran- 
tees a quantifiable measure of service for packets associated 
with a particular traffic class. More specifically, with respect 
to bandwidth management, it is desirable to employ a 
weighted fair queuing delivery schedule which shares avail- 
able bandwidth so that high priority traffic is usually sent 
first, but low priority traffic is still guaranteed an acceptable 
minimum bandwidth allocation. Also, it is desirable to 
centralize the control over bandwidth allocation and traffic 
priority to allow for QoS without having to upgrade or alter 
end-stations and existing routers as is typically required by 
end-lo-cnd protocol solutions. Further, it would be advan- 
tageous to put the control in the hands of network managers 
by performing bandwidth allocation and traffic prioritization 
based upon a set of manager-defined administrative policies. 
Finally, since there are many levels of control a network 
manager may elect to administer, it is desirable to provide a 
variety of scheduling mechanisms based upon a core set of 
QoS profile attributes. 

BRIEF SUMMARY OF THE INVENTION 

A flexible, policy-based, mechanism for managing, 
monitoring, and prioritizing traffic within a network and 
allocating bandwidth to achieve true Quality of Service 
(QoS) is described. According to one aspect of the present 
invention, a method is provided for managing bandwidth 
allocation in a network that employs a non-determninistic 
access protocol. A packet forwarding device receives infor- 
mation indicative of a set of traffic groups. The packet 
forwarding device additionally receives parameters, such as 
bandwidth and priority parameters, corresponding to the 
traffic groups. After receiving a packet associated with one 
of the traffic groups on a first port, the packet forwarding 
device schedules the packet for transmission from a second 
port based upon parameters corresponding to the traffic 
group with which the packet is associated. Advantageously, 
in this manner, a weighted fair queuing schedule that shares 
bandwidth according to some set of rules may be achieved. 

According to another aspect of the present invention, a 
method is provided for managing bandwidth allocation and 
traffic prioritization in a packet forwarding device. The 
packet forwarding device receives information indicative of 
a set of traic groups. The packet forwarding device addi- 
tionally receives information defining a Quality of Service 
(QoS) policy for the traffic groups. After a packet is received 
by the packet forwarding device, a traffic group with which 
the packet is associated is identified. Subsequently, rather 
than relying on an end-to-end signaling protocol for 
scheduling, the packet is scheduled for transmission based 
upon the QoS policy for the identified traffic group. 
Therefore, bandwidth allocation and traffic prioritization arc 
based upon a set of administrative policies over which the 
network manager retains control. 

According to yet another aspect of the present invention, 
a number of QoS queues are provided at each port of the 
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packet forwarding device, A current bandwidth metric is 
determined for each of the QoS queues for a particular port. 
The QoS queues are divided into two groups based upon 
their respective bandwidth metrics and their respective mini- 
mum bandwidth requirements. Subsequently, the groups are 
used as a first level arbitration mechanism to select a QoS 
queue that will source the next packet. 

Other features of the present invention will be apparent 
from the accompanying drawings and from the detailed 
description which follows. 

BRIEF DESCRIPTION OF THE SEVERAL 
VIEWS OF THE DRAWINGS 

The present invention is illustrated by way of example, 
and not by way of limitation, in the figures of the accom- 
panying drawings and in which like reference numerals refer 
to similar elements and in which: 

FIG. 1A is a simplified block diagram of an exemplary 
switch architecture in which one embodiment of the present 
invention may be implemented. 

FIG. IB is a logical view of the interaction between 
switch processing blocks according to one embodiment of 
the present invention. 

FIG. 2 is a flow diagram illustrating high level bandwidth 25 
management and traffic prioritization processing according 
to one embodiment of the present invention. 

FIG. 3 is a flow diagram illustrating periodic evaluation of 
QoS categories according to one embodiment of the present 
invention. 30 

FIG. 4 is a flow diagram illustrating next packet sched- 
uling according to one embodiment of the present invention. 

DETAILED DESCRIPTION OF THE 

INVENTION 35 

A flexible, policy-based, mechanism for managing, 
monitoring, and prioritizing traffic within a network and 
allocating bandwidth to achieve true Quality of Service 



10 



15 



20 



in machine-executable instructions, which may be used to 
cause a general-purpose or special-purpose processor pro- 
grammed with the instructions to perform the steps. 
Alternatively, the steps may be performed by a combination 
of hardware and software. While, embodiments of the 
present invention will be described with reference to a high 
speed Ethernet switch, the method and apparatus described 
herein are equally applicable to other types of network 
devices or packet forwarding devices. 

An Exemplary Switch Architecture 

An overview of the architecture of a switch 100 in which 
one embodiment of the present invention may be imple- 
mented is illustrated by FIG. 1A. The central memory 
architecture depicted includes multiple ports 105 and 110 
each coupled via a channel to a filtering/forwarding engine 
115. Also coupled to the filtering/forwarding engine 115 is 
a forwarding database 120, a packet Random Access 
Memory (RAM) 125, and a Central Processing Unit (CPU) 
130. 

According to one embodiment, each channel is capable of 
supporting a data transfer rate of one gigabit per second in 
the transmit direction and one gigabit per second in the 
receive direction, thereby providing 2 Gb/s full-duplex capa- 
bility per channel. Additionally, the channels may be con- 
figured to support one Gigabit Ethernet network connection 
or eight Fast Ethernet network connections. 

The filtering/forwarding engine 115 includes an address 
filter (not shown), a switch matrix (not shown), and a buffer 
manager (not shown). The address filter may provide 
bridging, routing, Virtual Local Area Network (VLAN) 
tagging functions, and traffic classification. The switch 
matrix connects each channel to a central memory such as 
packet RAM 125. The buffer manager controls data buffers 
and packet queue structures and controls and coordinates 
accesses to and from the packet RAM 125. 

The forwarding database 120 may store information use- 
ful for making forwarding decisions, such as layer 2 (e.g., 



(QoS) is described. "Quality of Service" in this context ^ Media Access Control (MAC) layer), layer 3 (e.g., Network 

layer), and/or layer 4 (e.g., Transport layer) forwarding 
information, among other things. The switch 100 forwards a 
packet received at an input port to an output port by 
performing a search on the forwarding database using 
45 address information contained within the header of the 
received packet. If a matching entry is found, a forwarding 
decision is constructed that indicates to which output port 
the received packet should be forwarded, if any. Otherwise, 
the packet is forwarded to the CPU 130 for assistance in 
constructing a forwarding decision. 

The packet RAM 125 provides buffering for packets and 
acts as an elasticity buffer for adapting between incoming 
and outgoing bandwidth differences. Packet buffering is 
discussed further below. 
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essentially means that there is a quantifiable measure of the 
service being provided. The measure of service being pro- 
vided may be in terms of a packet loss rate, a maximum 
delay, a committed minimum bandwidth, or a limited maxi- 
mum bandwidth, for example. 

In the present invention, a number of QoS queues may be 
provided at each port of a packet forwarding device, such as 
a Local Area Network (LAN) switch. Based upon a set of 
QoS parameters, various types of traffic can be distinguished 
and associated with particular QoS queues. For example, 
packets associated with a first traffic group may be placed 
onto a first QoS queue and packets associated with another 
traffic group may be placed onto a second QoS queue. When 
a port is ready to transmit the next packet, a scheduling 
mechanism may be employed to select which QoS queue of 55 
the QoS queues associated with the port will provide the 
next packet for transmission. 

In the following description, for the purposes of 
explanation, numerous specific details are set forth in order 
to provide a thorough understanding of the present inven- go 
tion. It will be apparent, however, to one skilled in the art 
that the present invention may be practiced without some of 
these specific details. In other instances, well-known struc- 
tures and devices arc shown in block diagram form. 

The present invention includes various steps, which will 65 
be described below. The steps of the present invention may 
be performed by hardware components or may be embodied 



Logical View of Exemplary Switch Processing 

FIG. IB is a logical view of the interaction between 
exemplary switch processing blocks that may be distributed 
throughout the switch 100. For example, some of the pro- 
cessing may be performed by functional units within the 
ports of the switch and other processing may be performed 
by the CPU 130 or by the address filter/switch matrix/buffer 
manager 115. In any event, the processing can be concep- 
tually divided into a first group of functions 160 dedicated 
to input processing and a second group of functions 185 
dedicated to output processing. According to the present 
embodiment, the first group 160 includes a comparison 
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engine 155, an enqueue black 161, a packet classification 
block 150, and a buffer manager 165. The second group 185 
includes a dequeue block 162, a Quality of Service (QoS) 
category evaluation block 175, and a scheduler 170. 

Additionally, a user interface (UI) 145 may be provided 
for receiving various parameters from the network manager. 
The UI may be text based or graphical. In one embodiment, 
the UI 145 may include an in-band HyperText Markup 
Language (HTML) browser-based management tool which 
may be accessed by any standard web browser. In any event, 
the goal of the UI 145 is to separate high-level policy 
components, such as traffic grouping and QoS profiles from 
the details of the internal switch hardware. Thus, user 
configuration time is minimized and a consistent interface is 
provided to the user. 

The UI 145 receives information indicative of one or 
more traffic groups. This information may be provided by 
the network manager. There are several ways to define a 
traffic group. Table 1 below illustrates a variety of traffic 
classification schemes that may be supported by the UI 145. 

TABLE 1 



Traffic Classification 

Policy Based Upon Traffic Group Definition OSI Layer 

Applications TCP Session Transport Layer 

UDP Session 
RSVP Flow 

Network Layer Network Layer Protocol Network Layer 

Topology or Groups of Subnet or IP Address 

Users VLAN Identifier 

Bnd-Station Applications MAC Address Link Layer 

302. lp or 802.1Q 
Physical Topology Physical Port Physical Layer 



The information used to identify a traffic group typically 
depends upon what terms the QoS policy is defined. If the 
QoS policy is based on applications, traffic groups may be 
differentiated at the Transport layer by Transmission Control 
Protocol (TCP) session or User Datagram Protocol (UDP) 
session. For example, the network manager may provide 
information indicative of TCP source and destination ports 
and IP source and destination addresses to identify traffic 
groups. However, if the QoS policy is based upon the 
Network layer topology or groups of users, traffic group 
definition may be more convenient by supplying informa- 
tion regarding the Network layer protocol, such as Internet 
Protocol (IP) or Internetwork Packet Exchange (IPX), the 
subnet or IP addresses, or VLAN identifiers. If the QoS 
policy is defined by end-station applications, then Media 
Access Control (MAC) addresses, IEEE 802.1p priority 
indications, or IEEE 802.1 Q frames may be employed to 
identify traffic groups. Finally, if the QoS policy is physical 
topology based, physical port identifiers may be used to 
differentiate traffic groups. 

It should be noted that Tabic 1 merely presents an 
exemplary set of traffic group identification mechanisms. 
From the examples presented herein, additional, alternative, 
and equivalent traffic grouping schemes and policy consid- 
erations will be apparent to those of ordinary skill in the art. 
For example, other state information may be useful for 
purposes of packet classification, such as the history of 
previous packets, the previous traffic load, the time of day, 
etc. 

II is appreciated that traffic classifications based upon the 
traffic group definitions listed above may result in overlap. 
Should the network manager define overlapping traffic 
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groups, the UI 145 may issue an error message and reject the 
most recent traffic group definition, the UI 145 may issue a 
warning message to the network manager and allow the 
more specific traffic group definition to override a conflict- 

5 ing general traffic group definition, or the UI 145 may be 
configured to respond in another manner. 

A number of QoS queues 180 may be provided at each of 
the ports of a packet forwarding device. In one embodiment, 
a mapping of traffic groups to QoS queues 180 may be 

10 maintained. As traffic groups are provided by the network 
manager, the UI 145 updates the local mapping of traffic 
groups to QoS queues 180. This mapping process may be a 
one-to-one mapping of the traffic groups defined by the 
network manager to the QoS queues 180 or the mapping 

15 process may be more involved. For example, there may be 
more traffic groups than QoS queues 180, in which case, 
more than one traffic group will be mapped to a single QoS 
queue. Some consolidation rules for combining multiple 
traffic groups into a single QoS queue will be discussed 

20 below. 

At any rate, by providing a layer of abstraction in this 
manner, the network manager need not be burdened with the 
underlying implementation details, such as the number of 
QoS queues per port and other queuing parameters. Another 

25 advantage achieved by this layer of abstraction between the 
traffic group definitions and the physical QoS queues is the 
fact that the UI 145 is now decoupled from the underlying 
implementation. Therefore, the UI 145 need not be updated 
if the hardware QoS implementation changes. For example, 

30 software providing for traffic group definition need not be 
changed simply because the number of QoS queues per port 
provided by the hardware changes. 
The input data stream is received by the comparison 

35 engine 155 from input switch ports (not shown). Under the 
direction of the packet classification process 150, the com- 
parison engine 155 determines with which of the previously 
defined traffic groups a packet in the data stream is associ- 
ated. The packet classification block 150 may employ the 

^ traffic group indications provided by the network manager to 
provide the comparison engine 155 with information regard- 
ing locations and fields to be compared or ignored within the 
header of a received packet, for example. It should be 
appreciated if the comparison required for traffic classifica- 

45 lion is straightforward, such as in a conventional packet 
forwarding device, then the comparison engine 155 and the 
packet classification block 150 may be combined. 

The packet classification block 150 in conjunction with 
the UI 145 provide a network manager with a flexible 

50 mechanism to control traffic prioritization and bandwidth 
allocation through the switch 100. Importantly, no end-to- 
end signaling protocol needs to be implemented by the 
network devices. For example, the end-station that is to 
receive the data stream need not reserve bandwidth on each 

55 of the intermediate devices between it and the source of the 
data stream. Rather, a packet forwarding device employing 
the present invention can provide some benefit to the net- 
work without requiring routers and/or end-stations to do 
anything in particular to identify traffic. Thus, traffic priority 

60 may be enforced by the switch 100 and QoS may be 
delivered to applications without altering routers or end- 
stations. 

According to one embodiment, the buffer manager 165 
participates in policy based QoS by controlling the alloca- 
65 tion of buffers within the packet RAM 125. Buffers may be 
dynamically allocated to QoS queues 180 as needed, within 
constraints established by QoS profile attributes, which are 
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discussed below. The buffer manager 165 may maintain 
several programmable variables for each QoS queue. For 
example, a Minimum Buffer Allocation and a Maximum 
Queue Depth may be provided for each QoS queue. The 
Minimum Buffer Allocation essentially reserves some mini- 
mum number of buffers in the packet RAM 125 for the QoS 
queue with which it is associated. The Maximum Queue 
Depth establishes the maximum number of buffers that can 
be placed on a given QoS queue. The buffer manager 165 
also maintains a Current Queue Depth for each QoS queue 
to assure the maximum depth is not exceeded. For example, 
before allowing a buffer to be added to a given QoS queue, 
the buffer manager 165 may compare the Maximum Queue 
Depth to the Current Queue Depth to ensure the Maximum 
Queue Depth is not exceeded. 

Variables are also maintained for tracking free buffers in 
the packet RAM 125. At initialization, a Buffers Free Count 
contains the total number of buffers available in the packet 
RAM 125 and a Buffers Reserved Count contains the sum of 
the minimum buffer allocations for the QoS queues 180. As 
packets are received they are stored in free buffers, and the 
Buffers Free Count is decremented by the number of buffers 
used for such storage. After the appropriate QoS queue has 
been identified the buffer manager 165 instructs the enqueue 
block 161 to add the packet to the QoS queue. The enqueue 
block 161 links the packet to the identified queue provided 
that the Current Queue Depth is less than the Maximum 
Queue Depth and either (1) the Current Queue Depth is less 
than the Minimum Buffer Allocation or (2) the Buffers 
Reserved Count is less than the Buffers Free Count. 
Therefore, if a QoS queue exceeds its reserve of buffers 
(e.g., Minimum Buffer Allocation), to the extent that addi- 
tional buffers remain free, the QoS queue may continue to 
grow. Otherwise, the enqueue block 161 will discard the 
packet, the buffers are returned to the free pool, and the 
Buffers Free Count is increased by the number of buffers that 
would have been consumed by the packet. When a packet is 
successfully linked to a QoS queue, the Current Queue 
Depth for that QoS queue is increased by (he number of 
buffers used by the packet. If, prior to the addition of the 
packet to the queue, the Current Queue Depth was less than 
the Minimum Buffer Allocation then the Buffers Reserved 
Count is decreased by the lesser of (1) the number of buffers 
in the packet or (2) the difference between the Current 
Queue Depth and the Minimum Buffer Allocation. 

The QoS category evaluation process 175 separates the 
QoS queues into a plurality of categories based upon a set of 
bandwidth parameters. The scheduler 170 uses the grouping 
provided by the QoS category evaluation process 175 to 
select an appropriate QoS queue for sourcing the next packet 
for a particular port. The evaluation of QoS queue categories 
may be performed periodically or upon command by the 
scheduler 170, for example. Periodic evaluation of QoS 
categories and scheduling is discussed in further detail 
below. 

Responsive to the scheduler 170 the dequeue block 162 
retrieves a packet from a specified QoS queue. After the 
packet has been transmitted, the buffer variables are 
updated. The Buffers Free Count is increased and the 
Current Queue Depth is decreased by the number of buffers 
utilized to store the packet, [f the resulting Current Queue 
Depth is less than the Minimum Buffer Allocation, then the 
Buffers Reserved Count is increased by the lesser of the 
number of buffers utilized to store the packet or the differ- 
ence between the Current Queue Depth and the Minimum 
Buffer Allocation. 

QoS Profile Attributes 

Setting QoS policy is a combination of identifying traffic 
groups and defining QoS profiles for those traffic groups. 
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According to one embodiment, each individual traflic group 
may be associated with a QoS profile. However, in alterna- 
tive embodiments, multiple traffic groups may share a com- 
mon QoS profile. Having described traffic group classifica- 

5 tion and identification above, QoS profile attributes (also 
referred to as parameters) will now be discussed. 

Several queuing mechanisms may be implemented using 
one or more of the following parameters associated with a 
traffic group: (1) minimum bandwidth, (2) maximum 

io bandwidth, (3) peak bandwidth, (4) maximum delay, and (5) 
relative priority. In general, the minimum, maximum, and 
peak bandwidth parameter may be expressed in Mbps, a 
percentage of total bandwidth, or any other convenient 
representation. 

15 Minimum bandwidth indicates the minimum amount of 
bandwidth a particular traffic group needs to be provided 
over a defined time period. If the sum of the minimum 
bandwidths for all traffic groups defined is less than 100% of 
the available bandwidth, then the scheduling processing, 

20 discussed below, can assure that each traffic group will 
receive at least the minimum bandwidth requested. 

Maximum bandwidth is the maximum sustained band- 
width the traffic group can realize over a defined time period. 
In contrast, peak bandwidth represents the bandwidth a 

25 traffic group may utilize during a particular time interval in 
excess of the maximum bandwidth. The peak bandwidth 
parameter may be used to limit traffic bursts for the traffic 
group with which it is associated. The peak bandwidth also 
determines how quickly the traffic group's current band- 

30 width will converge to the maximum bandwidth. By pro- 
viding a peak bandwidth value that is much higher than the 
maximum bandwidth, if sufficient bandwidth is available, 
the maximum bandwidth will be achieved relatively quickly. 
In contrast, a peak bandwidth that is only slightly higher that 

35 the maximum bandwidth will cause the convergence to the 
maximum bandwidth to be more gradual. 

Maximum delay specifies a time period beyond which 
further delay cannot be tolerated for the particular traffic 

40 group. Packets comprising the traffic group that are for- 
warded by the switch 100 are guaranteed not to be delayed 
by more than the maximum delay specified. 

Relative priority defines the relative importance of a 
particular traffic group with respect to other traffic groups. 

45 As will be discussed further below, within the same QoS 
category, traffic groups with a higher priority are preferred 
over those with lower priorities. 

This small set of parameters in combination with the 
variety of traffic classification schemes gives a network 

so manager enormous control and flexibility in prioritizing and 
managing traffic flowing through packet forwarding devices 
in a network. For example, the QoS profile of a video traffic 
group, identified by UDP session, might be defined to have 
a high priority and a minimum bandwidth of 5 Mbps, while 

55 the QoS profile of an engineering traffic group, identified by 
VLAN, may be set to a second priority, a minimum band- 
width of 30 Mbps, a maximum bandwidth of 50 Mbps, and 
a peak bandwidth of 60 Mbps. Concurrently, the QoS profile 
of a World Wide Web (WWW) traffic group, identified by 

60 protocol (e.g., IP), may be set to have a low priority, a 
minimum bandwidth of 0 Mbps, a maximum bandwidth of 
100%, and a peak bandwidth of 100%. 

Consolidation Rules 

65 It was mentioned earlier that multiple traffic groups may 
be mapped to a single QoS queue. This may be accom- 
plished by maintaining an independent set of variables (e.g., 
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minimum bandwidth, maximum bandwidth, peak 
bandwidth, maximum delay, and relative priority) for each 
QoS queue in addition to those already associated with each 
traffic group and following the general consolidation rules 
outlined below. 

Briefly, when the mapping from traffic groups to QoS 
queues is one-to-one, the determination of a particular QoS 
queues' attributes is straightforward. The QoS queue's 
attributes simply equal the traffic group's attributes. 
However, when combining multiple traffic groups that do 
not share a common QoS profile onto a single QoS queue, 
the following general consolidation rules are suggested: (1) 
add minimum attributes of the traffic groups being combined 
to arrive at an appropriate minimum attribute for the target 
QoS queue (e.g., the QoS queue in which the traffic will be 
merged), (2) use the largest of maximum attributes to arrive 
at an appropriate value for a maximum attribute for the 
target QoS queue, and (3) avoid merging traffic groups that 
have different relative priorities. This last rule suggests the 
number of priority levels provided should be less than or 
equal to the number of QoS queues supported by the 
implementation to assure traffic groups with different pri- 
orities are not combined in the same QoS queue. 

Importantly, when a network manager has determined that 
multiple traffic groups will share a common QoS profile, the 
consolidation rules need not apply, as the network manager 
has already, in effect, manually consolidated the parameters. 

Bandwidth Management and Traffic Prioritization 

Having described an exemplary environment in which 
one embodiment of the present invention may be 
implemented, bandwidth management and traffic prioritiza- 
tion will now be described with reference to FIG. 2. FIG. 2 
is a flow diagram illustrating the high level bandwidth 
management and traffic prioritization processing according 
to one embodiment of the present invention. In this 
embodiment, at step 210, a manager-defined QoS policy 
may be received via the UI 145, for example. The QoS 
policy is a combination of traffic groups and QoS profile 
attributes corresponding to those traffic groups. 

At step 220, a packet is received by the switch 100. Before 
the packet can be placed onto a QoS queue for transmission, 
the traffic group to which the packet belongs is identified at 
step 230. Typically, information in the packet header, for 
example, can be compared to the traffic group criteria 
established by the network manager to identify the traffic 
group to which the packet belongs. This comparison or 
matching process may be achieve d by programming filters 
in the switch 100 that allow classification of traffi c. Accord- 
ing to one embodiment, the packet may be identified using 
the traffic group definitions listed in Table 1. 

At step 250, enqueue processing is performed. The packet 
is added to the rear of the appropriate QoS queue for the 
identified traffic group. Importantly, if a maximum delay has 
been assigned to the traffic group with which the packet is 
associated, then the packet should either be dropped or 
transmitted within the period specified. According to one 
embodiment, this may be accomplished by limiting the 
depth (also referred to as length) of the corresponding QoS 
queue. Given the minimum bandwidth of the QoS queue and 
the maximum delay the traffic group can withstand, a 
maximum depth for the QoS queue can be calculated. If the 
QoS queue length remains less than or equal to the maxi- 
mum length, then the packet is added to the QoS queue. 
However, if the QoS queue length would exceed the maxi- 
mum length by the addition, then the packet is dropped. 
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At step 260, scheduling is performed. The scheduling/ 
dequeuing processing involves determining the appropriate 
QoS queue group, selecting the appropriate QoS queue 
within that QoS queue group, and removing the packet at the 
5 front of the selected QoS queue. This selected packet will be 
the next packet the port transmits. Scheduling will be 
discussed further below. 

Evaluation of QoS Categories 

10 According to one embodiment of the present invention, it 
is advantageous to divide the QoS queues into at least two 
categories. The categories may be defined based upon the 
maximum bandwidth, the minimum bandwidth, the peak 
bandwidth, and the "current bandwidth." The current band- 

15 width should not be mistaken for a bandwidth at an instant 
in time, rather the current bandwidth is a moving average 
that is updated periodically upon the expiration of a prede- 
termined time period. Empirical data suggests this predeter- 
mined time period should be on the order of ten packet 

20 times, wherein a packet time is the time required to transmit 
a packet. However, depending upon the environment and the 
nature of the traffic, a value in the range of one to one 
hundred packet times may be more suitable. 

^ The members of the first category ("Category A") are 
those QoS queues which have a current bandwidth that is 
below their peak bandwidth and below their minimum 
bandwidth. Members of the second category ("Category B") 
include those QoS queues that have a current bandwidth that 

3Q is greater than or equal to their minimum bandwidth, but less 
than both their maximum bandwidth and their peak band- 
width. The remaining QoS queues (e.g., those having a 
current bandwidth that is greater than or equal to either the 
peak bandwidth or the maximum bandwidth) are ineligible 

35 for transmission. These QoS queues that are ineligible for 
transmission can be considered a third category ("Category 
C"). With this overview of QoS categories, an exemplary 
process for periodic evaluation of QoS categories will now 
be described. 

40 FIG. 3 is a flow diagram illustrating periodic evaluation of 
QoS categories according to one embodiment of the present 
invention. In this embodiment, at step 310, processing loops 
until the predetermined evaluation time period has expired. 
For example, a test may be performed to determine if the 

45 current time is greater than or equal to the last evaluation 
time plus the predetermined evaluation time interval. 
Alternatively, the evaluation process may be triggered by an 
interrupt. In any event, when it is time to evaluate the QoS 
queue categorization, processing continues with step 330. 

so It will be appreciated that the time interval chosen for the 
predetermined evaluation time period should not be too long 
or too short. If the time interval is too long, one QoS queue 
might be allowed to monopolize the link until its maximum 
bandwidth is achieved while other QoS queues remain idle. 

55 If the time interval is too short, transmitting a single packet 
or remaining idle for a single packet time may cause the QoS 
queue to become a member of a different QoS category (e.g., 
the single transmission may cause the current bandwidth to 
exceed the maximum bandwidth or the single idle time may 

60 cause the current bandwidth to fall below the minimum 
bandwidth) because the moving average moves very quickly 
over short time intervals. 

At step 330, the current bandwidth for a particular QoS 
queue is set to the current bandwidth for that QoS queue as 

65 calculated in the previous time interval multiplied by a first 
weighting factor plus the actual bandwidth that particular 
QoS queue received in the previous time interval multiplied 
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by a second weighting factor, wherein the weighting factors bandwidth and minimum bandwidth (e.g., Category A), 

may be selected to achieve the desired level of respoDsive- However, if no QoS queues meet this criteria, Category B is 

ness in the current bandwidth metric. For example, it may be selected. 

desirable to have the current bandwidth converge to within At step 430, if multiple QoS queues are members of the 

a certain percentage of a sustained bandwidth if that band- 5 selected QoS category, processing continues with step 440, 

width has been sustained for a certain amount of time. otherwise, processing branches to step 470. 

Exemplary weighting factors are in the form (w-l)/w and M ^ me n] ^ w q{ ^ QoS g flre 

Irrespectively. Using weighting factors of % ,for the first used to ^ a (he QqS ^ q{ ^ 

weighting factor and a value of Vit for the second weighting ca ^ faave din daU 

factor, for example, the current bandwidth will reflect 50% to . . „ „ . 

of a step within 13 time intervals, 80% of a step within 27 At step 450, ,t two or more QoS queues have the same 

time intervals, and will be within 2% of the sustained P n ™J> ^ men Passing continues with step 460. Otherwise, 

bandwidth in approximately 63 time intervals (assuming a l [ a QoS ^ * foui l d t0 have ^highest relative priority, 

maximum and peak bandwidth of 100%). Alternative ratios lhen Processing branches to step 470. 

and current bandwidth metrics will be apparent to those of 15 At ste P 460 » & resolved by performing round robin 

ordinary skill in the art. or LRU scheduling. That is, until the QoS categories are 

After the current bandwidth has been evaluated for a QoS evaluated, the QoS queues having the same priority will be 

queue, at step 340, the QoS queue bandwidth parameters can r ° tated . in a P^etermined order or scheduled such 

be compared to the current bandwidth to determine to which the ? oS ihat „ h u as not P r0V1 ^ ed a P acket for 

QoS category the QoS queue belongs. As described above, *° transmission recently will be given such an opportunity, 

if (CURR_BW<PEAK_BW) and (CURR_BW<MIN_ ^ selectm S a QoS queue in this manner, processing 

BW), then the QoS queue is associated with Category A at continues ™^ step 470. 

step 350. If (CURR_BW^MIN_BW) and ((CURR_ At step 470, a packet is dequeued from the selected QoS 

BW<MAX_BW) and (CURR_BW<PEAK_BW)), then q ueue and ttie packet is transmitted by the port at step 480. 

the QoS queue is associated with Category B at step 360. If 25 This scheduling process may be repeated by looping back to 

(CURR_BW^PEAK„BW) or (CURR_BWSMAX_ step 410, as illustrated. 

BW), then the QoS queue is associated with Category C at „ . „ , 

step 370. Queuing Schemes 

At step 380, if all of the QoS queues have been evaluated, A variety of different queuing mechanisms may be imple- 
then processing branches to step 310; otherwise, processing 30 mented using various combinations of the QoS profile 
continues with step 330. attributes discussed above. Table 2 below illustrates how to 

achieve exemplary queuing mechanisms and corresponding 
Scheduling Processing configurations of the QoS profile attributes. 

Briefly, at each port, three levels of arbitration may be 35 

employed to select the appropriate QoS queue from which to TABLE 2 

transmit the next packet. The first level of arbitration selects Queuing Mechanism Configuration, 
among the QoS categories. Category A is given priority if 

any member QoS queues have one or more pending packets. Queuing Mechanism QoS Profile Attribute Wue 

Otherwise, a QoS queue with one or more pending packets ^ ^ ^ Minimum Bandwidth - 0% 

of Category B is selected. According to one embodiment, the Maximum Bandwidth - 100% 

relative priority assigned to each QoS queue may be used as Peak Bandwidth - 100% 

a second level of arbitriation. In this manner, when multiple Maximum Delay - N/A 

QoS qLueUes satisfy the first level arbitration, a higher RomJ RobwUiil Sttau?B.^™' 

priority QoS queue IS favored over a lower priority QoS 45 Recently Used Queuing Maximum Bandwidth - 100% 

queue. Finally, when there is a tie at the second level of Peak Bandwidth - 100% 

arbitration (e.g., two or more QoS queues in the same QoS Maximum Delay - n/a 

category have the same relative priority), a round robin or w - ^ , r ■ ■ * eIative *T' a J 'aT^ <fU ™ > 

b J , , n , ^ J/ ' , , Weighted Fair Queuing Minimum Bandwidth - >0% 

least recently used (LRU) scheme may be employed to Maximum Bandwidth - MAX_BW, 

select from among the two or more QoS queues until the 50 Peak Bandwidth - PEAK^_BW t 

QoS categories are evaluated. ™*?T U - ? day " N/ l 

Assuming a periodic evaluation of QoS categories is 

being performed, the scheduling processing need not include 

such evaluation and the scheduling processing may be PRIORITY,- represents a programmable priority value for 

performed as illustrated by FIG. 4, according to one embodi- 55 a particular QoS queue, i. Similarly, MAX_BW,- and 

ment of the present invention. In the embodiment depicted, PEAK_BW,. represent programmable maximum band- 

at step 410, processing loops until the port associated with widths and peak bandwidths, respectively, for a particular 

the group of QoS queues being evaluated indicates it is ready QoS queue, i. 

to receive the next packet for transmission. For example, the For a strict priority scheme, each QoS queue's minimum 

port may be polled to determine its transmission status. 50 bandwidth is set to zero percent, each QoS queue's maxi- 

Altematively, the scheduling process may be triggered by an mum bandwidth is set to one hundred percent, and each QoS 

interrupt. In any event, when the port is ready for the next queue's peak is set to one hundred percent. In this manner, 

packet, processing continues with step 420. the current bandwidth will never be less than the minimum 

Al step 420, a QoS category is selected from which a QoS bandwidth, and the current bandwidth will never exceed 

queue will provide the next packet for transmission. As 65 either the peak bandwidth or the maximum bandwidth. In 

described above, priority is given to the category containing this configuration, all QoS queues will be associated with 

QoS queues with pending data that are below the peak Category B since no QoS queues will satisfy the criteria of 



Relative Priority ■ <same for all queues> 
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either Category A or Category B. Ultimately, by configuring determining a corresponding QoS metric defining a mini- 

tbe QoS profile attributes in this manner, the second level of mum QoS for the traffic group, 

arbitration (e.g., the relative priority of the QoS queues) 3. The method of claim 1 wherein identifying the QoS 

determines which QoS queue is to source the next packet. metric corresponding to a traffic group further comprises: 

For a pure round robin or least recently used (LRU) 5 identifying the traffic group through a media access 

scheme, the QoS profile attributes are as above, but addi- control (MAC) address; and 

tionally all QoS queue priorities are set to the same value. In determining a corresponding QoS metric defining a mini- 

this manner, the third level of arbitration determines which mum QoS for the traffic group. 

QoS queue is to source the next packet. 4. The method of claim 1 wherein identifying the QoS 

Finally, weighted fair queuing can be achieved by 10 metric corresponding to a traffic group further comprises: 
assigning, at least, a value greater than zero percent to the identifying the traffic group through a virtual local area 
desired minimum bandwidth. By assigning a value greater network (VLAN) identifier; and 
than zero to the minimum bandwidth parameter, the particu- determining a corresponding QoS metric defining a mini- 
lar QoS queue is assured to get at least that amount of mum QoS for the traffic group, 
bandwidth on average because the QoS queue will be 15 5. The method of claim 1 wherein identifying the QoS 
associated with Category A until at least its minimum metric comprises receiving information indicating a mini- 
bandwidth is satisfied. Additionally, different combinations mum bandwidth for the traffic group, 
of values may be assigned to the peak and maximum 6. The method of claim 1 wherein identifying the QoS 
bandwidths to prevent a particular QoS queue from monopo- metric comprises receiving information indicating a maxi- 
lizing the link. 20 mum sustained bandwidth for the traffic group. 

7. The method of claim 6 wherein identifying the QoS 

Alternative Embodiments metric ^^^5 rece iving information indicating a peak 

While evaluation of QoS categories has been described bandwidth representing a bandwidth in excess of the maxi- 

above as occurring periodically, this evaluation may also be ^ mum sustained bandwidth that the traffic group can utilize, 

triggered by the occurrence of a predetermined event. Th e method of claim 1 wherein identifying the QoS 

Alternatively, evaluation of QoS categories may take place metric comprises receiving information indicating a maxi- 

as part of the scheduling processing rather than as part of a mum allowable delay for the traffic group, 

separate periodic background process, 9. The method of claim 1 wherein identifying the QoS 

While a relationship between the number of priority levels 30 metric comprises receiving information indicating a relative 
and the number of QoS queues has been suggested above, it P riorit y associated with the traffic group, 
is appreciated that the number of QoS queues may be 10. The method of claim 1 wherein determining a current 
determined independently of the number of priority levels. measure of network performance occurs at specified inter- 
Further, it is appreciated that the number of QoS queues vals of time - 

provided at each port may be fixed for every port or 35 U. The method of claim 1 wherein determining a current 

alternatively a variable number of QoS queues may be measure of network performance with respect to parameters 

provided for each port specified m the QoS metric comprises calculating the current 

Finally, in alternative embodiments, weighting factors me ^ ur ' for ^ P*™^ |° ^ 9°S oniric, 

and ratios other than those suggested herein may be used to ^ ^ mcthod of . ciaim * wherein receiving the data 

adjust the current bandwidth calculation for a particular 40 Packet composes receiving the data packet on a first port of 

molementat'on a pl uraut y °* P orts > and wherein removing the data packet 

f , .„ , , from the queue comprises transmitting the data packet from 

In the foregoing specification the invention has been a sccond of me luraUt of ^ 

described with reference to specific embodiments thereof. It 13 ^ mcthod of ^ ± wherejn ^ kct forwardi 

will, however be evident that various modifications and dcvicc { a non . dctcrmiDist i c acccss protocoI . 

changes may be made thereto without departing from the 45 u Tfae mcthod Qf c]aim u ^ ^ 

broader spirit and scope of the invention. The specification detcrranmistic acccss protocol employed by the packet for- 

and drawings are, accordingly, to be regarded in an illus- wafdm deyice fa (he SeflSC MuUi le Ac(XSS ^ 

trahve rather than a restrictive sense. Com&ion Dclcction (CS MA/CD) protocol. 

What is claimed is: 15 ^ articlc Q f manu f ac t U rc comprising a machine 

1. A method for bandwidth management in a packet 50 w ^^ le medium haviflg contenl tot ^ n tccei ^ ^ 
forwarding device, comprising: yides instruction t0 caus6 ^ electronic system to: 

identifying a quality of service (QoS) metric correspond- identify a quality of service (QoS) metric corresponding 

ing to a traffic group, the QoS metric defining a to a traffic grQup lhe QoS mctric dcfining a minimum 

minimum QoS for the traffic group; QoS for thc group . 

receiving a data packet associated with the traffic group; receive a daU packet associaled witn me traffic group; 

placing the data packet into one of a plurality of queues; place the data packet mto on e of a plurality of queues; 

identifying a current measure of network performance identify a current measure of network performance with 

with respect lo parameters specified in the QoS metric; respect to parameters specified in the QoS metric; and 

an ^ 60 remove the data packet from the queue if a difference 

removing the data packet from the queue if a difference between the current measure and the minimum QoS 

between the current measure and the minimum QoS falls within a threshold. 

falls within a threshold. 16. The article of manufacture of claim 15 wherein the 

2. The method of claim 1 wherein identifying the QoS content to provide instructions to cause the electronic system 
metric corresponding to a traffic group further comprises: gs t 0 identify the QoS metric corresponding to a traffic group 

identifying the traffic group through an Internet Protocol further comprises the content to provide instructions to 

(IP) subnet membership identifier; and cause the electronic system to: 
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identify the traffic group through an Internet Protocol (IP) to identify the QoS metric comprises the content to provide 

subnet membership identifier; and instructions to cause the electronic system to receive infor- 

determine a corresponding QoS metric defining a mini- mation indicating a maximum allowable delay for the traffic 

mum QoS for the traffic group. group. 

17. The article of manufacture of claim 15 wherein the 5 23. The article of manufacture of claim 15 wherein the 
content to provide instructions to cause the electronic system content to provide instructions to cause the electronic system 
to identify the QoS metric corresponding to a traffic group to identify the QoS metric comprises the content to provide 
further comprises the content to provide instructions to instructions to cause the electronic system to receive infor- 
cause the electronic system to: mation indicating a relative priority associated with the 

identify the traffic group through a media access control 10 traffic group. 

(MAC) address; and 24. The article of manufacture of claim 15 wherein the 

determine a corresponding QoS metric defining a mini- content to provide instructions to cause the electronic system 

mum QoS for the traffic group. to determine a current measure of network performance 

18. The article of manufacture of claim 15 wherein the J5 comprises the content to provide instructions to cause the 
content to provide instructions to cause the electronic system electronic system to determine the current measure at sped- 
to identify the QoS metric corresponding to a traffic group fi e( j intervals of time. 

further comprises the content to provide instructions to 25. The article of manufacture of claim 15 wherein the 

cause the electronic system to: content to provide instructions to cause the electronic system 

identify the traffic group through a virtual local area 2 o to determine a current measure of network performance with 

network (VLAN) identifier; and respect to parameters specified in the QoS metric comprises 

determine a corresponding QoS metric defining a mini- the content to provide instructions to cause the electronic 

mum QoS for the traffic group. system to calculate the current measure for the parameters 

19. The article of manufacture of claim 15 wherein the specified in the QoS metric, 

content to provide instructions to cause the electronic system 25 26. The article of manufacture of claim 15 wherein the 

to identify the QoS metric comprises the content to provide content to provide instructions to cause the electronic system 

instructions to cause the electronic system to receive infer- l0 reC eive the QoS metric comprises the content to provide 

mation indicating a minimum bandwidth for the traffic instructions to cause the electronic system to receive the data 

S rou P- t packet on a first port of a plurality of ports, and wherein the 

20. The article of manufacture of claim 15 wherein the 30 to providc iDStructions t0 causc the electronic system 
content to provide instructions to cause the electronic system to ^ ^ kc , from me the 

to identify the QoS metric comprises the content to provide nrm ,iAm i^.n^nc „„„„ ^ ^, ciorn 

, 4 . iL i „ • t , • • t content to provide instructions to cause the electronic system 

instructions to cause the electronic system to receive infor- A LJ i . * j ui.ii-. 

mation indicating a maximum sustained bandwidth for the ^ transmit the data packet from a second port of the plurality 

traffic group. 35 of P orts - 

21. The article of manufacture of claim 20 wherein the 21 ■ ^ article of nocture of claim 15 wherein the 
content to provide instructions to cause the electronic system P acket forwarding device employs a non-deterministic 
to identify the QoS metric comprises the content to provide access protocol. 

instructions to cause the electronic system to receive infor- 28. The article of manufacture of claim 27 wherein the 

mation indicating a peak bandwidth representing a band- 40 non-deterministic access protocol employed by the packet 

width in excess of the maximum sustained bandwidth that forwarding device is the Carrier Sense Multiple Access with 

the traffic group can utilize. Collision Detection (CSMA/CD) protocol. 

22. The article of manufacture of claim 15 wherein the 

content to provide instructions to cause the electronic system ***** 
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ABSTRACT 



A policy engine for handling incoming data packets. The 
policy engine includes a stream classification module, a data 
packet input/output module, and a policy enforcement mod- 
ule. The policy enforcement module further includes a 
packet scheduler, an on-chip packet buffer circuitry, and a 
plurality of action processors. The stream classification 
module creates a packet service header for each data packet, 
wherein the packet service header indicates policies to be 
enforced for that data packet. The action processors enforce 
the policies. 
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POLICY ENGINE ARCHITECTURE 

RELATED APPLICATIONS 

This application claims the benefit of priority to U.S. 
Provisional Patent Application Scr. No. 60/112,859, filed 
Dec. 17, 1998. 

TECHNICAL HELD 

The present invention relates to policy-based network 
equipment and, in particular, to policy-based network equip- 
ment that employs a favorable division of hardware and 
software to provide both performance and flexibility. 

BACKGROUND 

Some typical policy-based computer network applications 
are Virtual Private Networks (VPN), Firewall, Traffic 
Management, Network Address Translation, Network 
Monitoring, and TOS Marking. In general, the policy-based 
application has access to the network media through an 
operating system driver interface. In a typical network 
architecture, the policy-based application examines every 
packet coming in from the network along the data path, 
compares it against flow classification criteria, and performs 
the necessary actions based upon the policies defined in a 
policy database. 

Today's policy-based applications are challenged with 
several key issues. These issues can be major inhibitors for 
the future growth of the emerging industry: 

1) Flow classification overhead — Flow classification speci- 
fications can be complicated and lengthy for each network 
service. As can be seen from FIG. 1, in a conventional 
policy-based application, each packet compared with 
potentially hundreds of rules in order to find the matching 
one and determine the proper action specifications. With 
stateful applications, state tracking is even more time 
consuming. Multiple network services on a single system 
simply make matters worse. 

As is also shown in FIG. 1, the process of flow classifi- 
cation and action processing may repeat for many 
iterations as multiple policies are activated at the same 
time. For example, a VPN (virtual private network) 
application may comprise Firewall Policy, IPSEC 
Policy, IPCOMP (IP compression) policy, NAT 
(Network Address Translation) Policy, QoS (Quality of 
Service) policy, Monitoring Policy, L2TP/PPTP (L2 
Tunnel Protocol/Point To Point Tunnel Protocol) Tun- 
nel Policy, and so on. 

The flow classification is a rule based operation that can 
be very flexible to tune to application needs. For 
example, it may define a rule to identify packets with 
a pattern of any random byte within a packet, and/or 
across many packets. The flow classifiers may also 
differ per action processor for performance optimiza- 
tion. As a result the matching criteria used by a flow 
classifier to classify a flow may include a specific value, 
a range, or wildcard on interface port numbers, 
protocols, IP addresses* TCP ports, applications, appli- 
cation data, or any user specifiable criteria. The dis- 
tinctions of various implementation makes it difficult to 
cache a flow with its decision in many ways. 

2) Flow classification technique is evolving — Flow classi- 
fication and analysis technique is more than just looking 
into the packet's address, port number and protocol type 
and or other header information. It often involves state 
tracking for newer applications. This technique is being 
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continuously modified and, therefore, is not practically 
appropriate for a hardware based implementation. 
Furthermore, flow classification techniques are often 
viewed as key differentiaters between vendors. 

5 3) Action execution speed — Once the classification process 
is complete, the proper actions need to be executed. Some 
of the actions arc simple like a discard or forwarding 
decision for a firewall, while some others are extremely 
time consuming, like triple-DES encryption and SHA 

10 hashing algorithm or QOS scheduling algorithm. Soft- 
ware based implementations cannot keep up with the 
bandwidth expansion as newer and faster media technolo- 
gies arc employed. 
4) Integrated services — As more and more policy-based 

15 applications become available, it is desirable to provide 
integrated services on a single platform because this 
ostensibly reduces policy management complexity, 
avoids potential policy conflicts, and lowers the TCO 
(Total Cost of Ownership). On the other hand, integrated 

20 services impose a very large computing power require- 
ment that cannot be practically achieved with off-the-shelf 
general purpose machines. A disadvantage of the conven- 
tional architecture is that, because it is primarily software - 
based, it is relatively high overhead. However, precisely 

25 because it is software-based, it is quite flexible. 

What is desired is a policy architecture has the flexibility 
of present flow classification systems, but that also has lower 
overhead. 

30 BRIEF DESCRIPTION OF THE FIGURES 

FIG. 1 is a block diagram illustrating conventional flow 
classification and action processing. 

FIG. 2 is a block diagram illustrating the a broad aspect 
of a policy architecture in accordance with an embodiment 
35 of the invention. 

FIG. 3 is a block diagram illustrating details in accordance 
with one embodiment of FIG. 2. 

FIG. 4 illustrates more details of how the FIG. 3 archi- 
^ lecture is employed to process network data. 

FIGS. 5 and 6 illustrate headers added to the packets by 
the stream classification module for use by the action 
processors. 

DETAILED DESCRIPTION 

45 

As shown broadly in FIG. 2 and in greater detail in FIG. 
3, in accordance with one embodiment of the invention, an 
architecture 100 for applying policies to network data traffic 
allocates the application of policies between software and 

50 hardware such that the system is flexible yet efficient. 

The architecture 100 includes three major components — a 
Policy-Based Application 102, a Policy engine API 104 
("API" stands for Application Program Interface") and a 
Policy engine 106. As can be seen from FIGS. 2 and 3, the 

55 policy-based application 102 — such as a firewall, virtual 
private network (VPN), or traffic management — is typically 
a "legacy" software program residing on a host, equipped 
with its own policy database 202 and flow classifier logic 
204. 

60 The policy engine API 104 serves as an interface between 
the policy application 102 and the policy engine 106 (via a 
system bus 105). The policy engine 106 is a purpose-built 
hardware (preferably running at wire speed) that operates on 
input network traffic and network policies and that outputs 

65 regulated traffic flows based upon the network policies. 
In a typical embodiment, the policy engine API 104 
provides the policy-based application 102 access to all the 
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media 1/0 through a generic operating system driver inter- 
face. In addition, the API 104 allows the application 104 to 
invoke acceleration functions (shown in FIG. 3 as applica- 
tion processors 206, or "AP's") provided by the policy 
engine 106 . The application processors 206 operate based on 5 
the stream classifier 207 of the policy engine 106 determin- 
ing that a packet belongs to a particular stream and activat- 
ing the appropriate action processors 206 according to action 
specifications 210 in a policy cache 209. That is, overall 
system performance is enhanced by virtue of the appropriate JQ 
acceleration functions (action processors 206) of the policy 
engine 106 being activated to regulate the network traffic. 

Before proceeding, several terms are defined in the con- 
text of FIGS. 2 and 3. The definitions provided herein are 
meant to be explanatory, and not necessarily limiting when 
a similar or identical term is used in the claims. 15 
Service 

A service in a policy-based network defines a network 
application 102 that is controlled and managed based on a 
set of policies. Typical services are firewall, VPN, traffic 
management, network address translation, network 20 
monitoring, etc. 
Policy 

Policies (normally defined by network managers) are 
collectively stored in a policy database 202 accessible to the 
policy-based applications 102 (even conventionally) and 25 
describe network traffic behaviors based upon business 
needs. A policy specifies both what traffic is to be subject to 
control and how the traffic is to be controlled. Thus, a policy 
typically has two componenLs — a flow classification speci- 
fication 203a and an action specification 203 b. 30 
Flow Classification Specification 203a 

A flow classification specification 203a provides the 
screening criteria for the flow classifier logic 204 to sort 
network traffic into flows. A flow classification specification 
203a can be very elaborate, as detailed as defining a specific 35 
pair of hosts running a specific application. Alternately, a 
flow classification specification 203a can have a simple 
wildcard expression. 
Action Specification 2036 

An action specification 2036 describes what to do with 40 
packets that match an associated flow classification speci- 
fication 203a. The action specification 203b can be as 
simple, for example, as a discard or forward decision in the 
firewall case. It can also be as complicated as IPSec encryp- 
tion rules based on a SA (Security Association) specifica- 45 
lion. 
Flow 

All packets that match the same flow classification speci- 
fication 203d form a flow. 

Flow Classifier 50 

Referring again to FIG. 3, a policy decision is at least 
initially derived by a policy-based application from the 
policy database 202. As discussed above, a flow is a stream 
of correlated packets to which policy decisions apply. With 
the described embodiments in accordance with the ss 
invention, referring again specifically to FIG. 3, for at least 
some of the packets, a flow classifier 204 classifies the 
packet according to one or more classification specifications 
203a and finds one or more corresponding action specifica- 
tions 2036. The found action specifications 2036 are then 60 
provided to the policy cache 209 for later execution by the 
policy engine 106 to enforce the policy. 
Policy Binding 

Policy binding is the process of the flow classifier 204 
binding a stream with its associated action specification and 65 
loading the appropriate entries (stream specification 203 and 
action specifications 210) into the policy cache 209. 
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Stream 

A stream is an "instantiation" of a flow — packets that 
have the same source and destination address, source and 
destination port, and protocol type. (Optionally, the appli- 
cation can add the input and output media interface to the 
stream classification criteria in addition to the packet header 
if desired.) Packets may be sorted into streams, and a flow 
may include one or more streams. All packets belonging to 
the same stream arc to be regulated by the same policy. 
Policy Cache 209 

At the completion of the policy binding process, an entry 
for a given stream is created on the policy engine which 
contains all the policy information required to subsequently 
process data of the stream 
Integrated Services 

When multiple network services are to apply to the same 
flow, this is called "Integrated Services". Integrated Services 
simplify the management of various service policies, mini- 
mize potential policy conflicts and reduce TCO (Total Cost 
of Ownership). 
Stream Specification 

Astream specification 208, shown in FIG. 3 held in policy 
cache 208 is the criteria used by the stream classifier 207 to 
uniquely identify a stream. In one embodiment, the stream 
specification 208 is compared to a 5-tuple in a packet 
header — source and destination address, source and desti- 
nation port, and protocol type. 
Action Processor 206 

Each action processor 206 executes an action based upon 
an action specification 210 in the policy cache 209. 
Packet Tagging 

Certain applications (e.g. Network Monitoring) would 
like to receive flows based on the flow classification speci- 
fication and would prefer that flow classification be per- 
formed for them. Packet tagging is a way of tagging all 
incoming packets with an application specified "tag 
Policy-based Application 

A policy-based application provides a service to the 
network users. This service is managed by a set of policies. 
Firewall, VPN and Traffic Management are the most typical 
policy-based applications. As the industry evolves, policy- 
based applications are likely to consolidate onto a single 
platform called Integrated Services. Integrated Services has 
the benefits of centralized policy management and lower 
cost of ownership. 

Referring still to FIG. 3, the population and use of the 
policy cache 209 is now discussed in greater detail. As 
discussed above, the policy-based application 102 (typically 
a legacy application) is equipped with its own policy data- 
base 202 and flow classifier logic 204. Some of the packets 
of a stream are provided (via a data path shown logically as 
401 in FIG. 3) to the flow classifier 204. The flow classifier 
204 uses the policy database 202 to determine the action 
specifications 2036 that correspond to the policies of the 
flow to which the stream belongs. The action specifications 
are provided (via the path shown logically as 402 in FIG. 3) 
to the policy cache 209. It should be noted that multiple 
packets may be required for more sophisticated flow clas- 
sification (stateful packet inspection), since the policy deci- 
sions (action specifications) may come from different appli- 
cations which may have implemented different flow 
classifiers. In those cases, the application's flow classifica- 
tion logic keeps track of the flow's state until a matching 
criteria is met. Preferably, though, just enough packets of a 
stream are provided to the flow classification logic 204 via 
theological path 401 to properly determine the action speci- 
fications 2036 for the stream. At the end of the "learning 
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phase", the application software 102 has uniquely identified challenge to provide for adding new policies, even when the 

a policy for the incoming packet stream new policies to be added are not even known at the product 

Subsequent packets of the stream are then provided release, 
directly to the stream classifier 207 of the policy engine 106 FIG. 4 illustrates a policy engine architecture in accor- 

via the logical data path 403. Using the policy cache 209, the 5 dance ^ one embodiment of the invention. The FIG. 4 

stream classifier 207 determines which action processors polj in£ a packet j ^ Moduk m 

206 are to be activated for the packets of the stream. _ J ° . .... , n . . n ,■ T -, r 

Specifically, the stream classifier 207 matches the packets to a Strcam Classification Module 404, and a Policy Enforce- 

a particular stream specification 208 and then, using the ment Module 406. 

corresponding action specifications 210, activates the proper 10 The Packet Input/Output Module 402 receives packets, 

action processors 206. Significantly, these "subsequent places the received packets in the external packet memory 

packets" can be acted upon without, any interaction to the 450 and notifies the Stream Classification Module 404 of 

"host" policy-based application 102. The application need such packets. Upon completion of all policies enforcement, 

not "see" any packets belonging to that stream after the me Packet Input/Output Module 402 transmits the packet 

binding (unless the stream is actually destined for the host.). 15 &om cxtcrna i pack et memory 450 to the network. 
The action processors are specialized in executing specific The Stieam classification Module 404j based on the 
action specifications, preferably at the wire speed ]ic cachej Cfeates a Packet Seryice Headef (shown m 

Thus, m summary, upon the completion of the policy FIG. 5) for each packet. The Packet Service Header indicates 

binding learning process the policy engine 106 may what licifiS need to be enforced and in what order ^ 

immediately take control of the bound stream and execute ^ Streanj classiiication Module 404 ^ ^tiwuc rogram . 

the appropriate actions in accordance with the action speci- mabk ^ b nG 5f thc Packel Scrvice Hcader 

fications 210 in the policy cache 209 without any interven- indudes a number of ir5 ^ , D and ^ Pointcrs ^ ^ 

Uon from the host policy-based) application. This method m UQi j defines an Afition and the ^ ^ 

also relieves the policy engine 106 hardware from doing po i atst o the AcUon Spec required to enforce such policy An 

complicated pattern matching because it can simply com- ^ e k of an actioQ tssoI is a DES engine that needs 

pute a hash va ue (or use some other identification . function) a 56 . bit or m . Wt k to do ^ Uon or dccryp ti on . 

from the well known fields (which uniquely identify a Thc u cache can bc modified if nctwork ircmenls 

stream) of the packet to find its corresponding policy dcci- cfa fa Mtion {Q ^ thc Qrdcr of differcnt u 

siors (action specifications 210) The classification need not cnforccmcnt . can also be programmed to achieve different 

be done more than once for each packet even though there 30 application requirements. 

may be multiple appUcations. As a result, massive comput- ^ p oli Enforcemcnt Modulc 406 includes a Packet 

mg power is not required to do the classification on an Schedulcr 4^ 0n Chip packet Buffers) 410, and at least 

ongoing basis A benefit is mexpensive hardware cost for Qnc Action Processor 412 . The Packet Scheduler 408 copies 
very high performance policy-based applications kets ffom ^ ffl 450 tQ ^ Qn Chi 

It can be seen that in accordance with the present 3S Packet Buffer 410. After copying the packets to the Packet 

invention, use of die policy engine and pohcy cache not only Buffcr 410 kfits afe fr ted mt0 64 b lcs cells . ^ 

addresses many if not all of the performance considerations ^ Cdl Hcader (nG ^ ^ addcd lQ ^ b inni 

discussed above m the Background, but also preserves a of each ^ ^ fac Ccl] Servicc Headef a 

great amount of flexibdity in setting network policies and packet Numbcr to m[ { i(fcnlif a kct . Q ^ Pflli 

the following considerations are taken into account. ^ Enforcement Module pipeIme arjd a Start bit ^ Slop bit to 

1) Timc-to-market for application developers— Since indicate the first Md last ^ of a packet ANext ^ field> 
Ume-to-markct is a major concern for application togethcr wilh the Ap IDs in thc Packct Service Header, 
vendors, the PAP1 design minimizes the development md i cate5 to the Policy Enforcement Module 406 what is the 
effort required by the application developers in order next destination Action Processor of each cell. 

for the existing applications to take advantages of the 4S Jt ^ pie f erable to have the On Chip Packet Buffer 410 

policy engine's enhanced performance. because it allows the Acfion Processors 412 very low latency 

2) Maintain flexibility for developers' value-added— and high bandwidth access to the packets as compared with 
PAPI may allow application developers to enhance or having to access the external Packet Memory 450. In case 
maintain their value-add so that vendors' differentia- me next Action Processor 412 is busy for a cell, the On Chip 
tion is not compromised. 50 Packet Buffer 410 serves as temporary storage for that cell. 

3) Platform for integrated services — PAPI has the model This prevents the blocking of following cells which need to 
of an integrated services platform in mind. Application go through this same Action Processor 412. 
developers can, over time, migrate their services into an Each Action Processor 412 performs a particular pohcy 
integrated platform without worrying about the exten- enforcement. In addition to this, it is capable of reading the 
sibility of the API and the performance penalty. 55 required action spec based on the AP pointer on the packet 

All of the preceding has been disclosed in a co-pending Service Header (FIG. 5). Each Action Processor may also 

patent application. The focus of the current application is have its own input and/or output FIFO to buffer the cells, 
one aspect of the policy engine itself: namely, the architec- Each cell is routed to the next Action Processor based on 

ture of the policy engine itself from a system level. One the Packet Service Header (FIG. 5) and the Cell Service 

challenge of this (or, in fact any) hardware assisted solution so Header (FIG. 6). The cell routing is preferably distributed to 

is not only maximizing parallelism among enforcement of each Action Processor, instead of being centralized to a cell 

different policies, but also that the architecture requires routing unit. This distributed approach allows for adding and 

flexibility among order of enforcement. This is because the removing policies much more easily. Upon completion of all 

policy enforcement is usually order sensitive. For example, policy enforcement for a particular packet, the packet sched- 

decryption of packets needs to be performed after firewall 65 uler 408 copies that packet to external packet memory 450. 

policy enforcement for incoming traffic to an corporate The Packet Input/output module 402 is then notified and 

network. In addition to being order sensitive, it is also a transmits the packel to the network. 
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What is claimed is: 

1. A policy engine comprising: 

a stream classification module; 

a packet input/output module that places received packets 5 
in an external packet memory and that notifies the 
stream classification module of the packets in the 
external packet memory; 

wherein the stream classification module creates a packet 10 
service header for each packet in the external packet 
memory indicating, based on a policy cache, policies to 
be enforced on that packet; 
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a policy enforcement module to enforce policies on the 
packets, including 

a packet scheduler that fragments each packet into cells 
and schedules enforcement of the policies on each 
cell based on the packet service header; 
on-chip packet buffer circuitry to temporarily hold the 
packets during policy enforcement; and 
a plurality of action processors, each action processor 
performing a particular policy enforcement on a cell 
and routing the cell to a next one of the action proces- 
sors. 

***** 
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ABSTRACT 



A router includes a classifier that classifies packets assigned 
to 2"~ m classes of service into 2" classes of service that are 
supported by the router. The classifier then sets the loss 
priorities of the respective packets to one of m levels. The 
router uses a modified weighted random early detection 
scheme that is based on probabilities of discard associated 
with the 2"*™ classes of service to determine whether to 
retain or discard particular packets. The router uses a single 
buffer to store packets directed to all of the various output 
ports. The available storage locations in the buffer are linked 
to a free queue and a weighted average depth of the free 
queue, is used to determined whether or not to retain a given 
packet. The router compares the weighted average depth of 
the free queue to maximum and minimum thresholds asso- 
ciated with the particular T* m class of service to which the 
packet is assigned. If the weighted average is above the 
maximum threshold, the packet is retained. If the weighted 
average is below the minimum threshold, the packet is 
discarded. If the weighted average is between the two 
thresholds, a probability of discard that is based on the 2 n+m 
classes of service is calculated and compared to a random 
value to determine whether or not the packet should be 
retained. If the probability of discard exceeds the random 
value, the packet is discarded. The other packets exit various 
output ports of the router based on weighting factors asso- 
ciated with the 2" classes of service. 

34 Claims, 5 Drawing Sheets 
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ROUTER WITH CLASS OF SERVICE 
MAPPING 

FIELD OF INVENTION 

The invention relates generally to routers and switches 
and, more particularly, to routers and switches that support 
multiple classes of service for packet routing. 

BACKGROUND OF THE INVENTION 

At network multiplexing points, such as switches or 
routers, the handling of frames or packets is generally 
determined by rules associated with classes of service to 
which given frames or packets are assigned. The classes of 
service essentially define acceptable packet or frame delays 
and probabilities of packet or frame loss. (As used herein, 
the term "packet" refers to both frames and packets, and the 
term "router" refers to both switches and routers.) 

The packets are typically assigned to classes of service 
based on information contained in the packet and/or traffic 
management rules established by either a network supervi- 
sor or a service provider. All packets assigned to the same 
class receive the same treatment. Being assigned to a 
"higher" class ensures that a packet will have a shorter 
maximum transmission delay and a lower probability of 
loss. Being assigned to a "lower" class may mean a longer 
delay and/or a greater probability of loss. 

Generally, the router maintains at each output port a buffer 
for holding packets in queues associated with the classes of 
service. The queues ensure that packets are delivered in 
order within the various classes of service, and that the 
associated rules for maximum delays and probabilities of 
loss can be enforced. Since each queue is essentially sepa- 
rately maintained, the more classes the router supports the 
more processing and storage capacity is required for a given 
number of output ports. To support "x" classes, for example, 
the router must set aside buffer storage locations for each of 
the x queues at each of its "y" ports. Further, it must 
determine for each queue whether or not a next packet 
should be retained or discarded. The router thus makes x*y 
separate calculations based on queue length and/or available 
associated storage locations to determine whether to retain 
or discard the packets, where represents multiplication. 

Network standards, such as the (revised) 802.1p standard, 
have relatively recently increased the number of classes of 
service to eight classes. Routers operating under prior stan- 
dards support four classes of service, and thus, must be 
upgraded, for example, with increased storage capacities of 
the output port buffers, to support the increased number of 
classes. Such upgrading may be prohibitively expensive 
and/or it may not be feasible. Accordingly, what is needed is 
a mechanism to operate a router that supports a relatively 
small number of classes of service in an environment in 
which packets are assigned to a greater number of classes. 
Such a mechanism should, without requiring the enlarged 
storage and processing capabilities conventionally associ- 
ated with supporting the greater number of classes, maintain 
service distinctions associated with the greater number of 
classes and more importantly retain the order of packets 
within each of the greater number of classes. 

SUMMARY OF THE INVENTION 

A router maps packets assigned to 2"* m classes of service 
into 2" classes of service and assigns the packets to 2 m levels 
of loss-priority within each of the 2" classes. The router 
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includes a classifier that uses n bits of an (n+m)-bit "class of 
service identifier" to map the packets to the 2" classes, and 
the remaining m bits to assign the loss priorities. The router 
then controls packet retention/discard with a modified 

5 weighted random early detection scheme based in part on 
the 2 w * m classes and in part on the 2" classes, to maintain the 
probability of loss distinctions and in-order packet handling 
associated with the 2"*™ classes. 
A scheduler controls the transmission of packets by each 

10 output port based on the 2" classes of service. The scheduler 
uses a weighted round robin scheme to ensure that packets 
from each of the classes are transmitted by each of the output 
ports within the prescribed maximum delay limits associated 
with the 2 n+fH classes of service. 

15 The router includes an output buffer that holds the packets 
for all of the router's output ports. The router maintains a 
"free queue," which links the buffer storage locations avail- 
able for packet storage. To determine whether to retain or 
discard a given packet, the router compares a weighted 

20 average depth of the free queue with predetermined maxi- 
mum and minimum thresholds that arc associated with the 
particular one of the 2 n+m classes of service to which the 
packet is assigned. If the weighted average exceeds the 
associated maximum threshold, the router retains the packet 

25 in a storage location that is then removed from the free 
queue and linked to a class of service per output port queue 
that corresponds to the class of service to which the packet 
is mapped by the classifier. If the weighted average depth 
falls below the associated minimum threshold, the router 

30 discards the packet. If the weighted average depth falls 
between the associated minimum and maximum thresholds, 
the router calculates a probability of discard and compares 
the probability to a "random" value. The router discards the 
packet if the probability exceeds the random value, and 

35 otherwise retains the packet. 

The maximum and minimum thresholds are set relative to 
one another such that the loss priorities associated with the 
2"* m classes are maintained. As discussed below, the router 

^ makes only one weighted average queue depth calculation 
for the free queue, and uses this calculation to determine 
whether to retain or discard packets for the 2" classes of 
service. This is in contrast to prior known routers that must 
maintain at each output port separate average queue depths 

45 for each of the class of service per port queues. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention description below refers to the accompa- 
nying drawings, of which: 
50 FIG. 1 is a functional block diagram of a network that 
includes routers that are constructed and operate in accor- 
dance with the invention; 

FIG. 2 is a functional block diagram of a router of FIG. 

s5 1; 

FIG. 3 illustrates a mapping of packets to classes of 
service; 

FIG. 4 is a flow chart of the operations of the router of 
FIG. 2; and 

60 FIG. 5 is a graph of weighted average queue depth versus 
probability of packet discard. 

DETAILED DESCRIPTION OF AN 
ILLUSTRATIVE EMBODIMENT 

65 Referring now to FIG. 1, a network 10 includes a plurality 
of endstations 12 and nodes 14 that transmit packets to other 
endstations 12 and nodes 14 through routers 16 and 17. The 
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endstations 12 and nodes 14 assign packets to classes of 
service based on information contained in the packets and/or 
on predetermined traffic management rules that are provided 
by the network manager and/or various service providers. 
The classes of service are essentially associated with maxi- 
mum limits for transmission delays and probabilities of 
packet loss. Higher classes are associated with shorter 
maximum delays and lower probabilities of packet loss. 
Packets that must be delivered as essential parts of a 
transmission are, for example, assigned to a higher class 
than are packets that contain non-essential information. 

Preferably, the endstation 12 or the node 14 that intro- 
duces the packet to the network assigns the packet to one of 
2"* m classes of service. To inform the routers 16 and 17 of 
the assignment, the endstation 12 or node 14 writes an 
appropriate class of service (COS) "tag" to a COS identifier 
field in the header that is included in the packet. The COS 
identifier field has three bits, as defined by (revised) standard 
802. lp, and the packet is thus assigned to one of eight 
classes of service, i.e., 1 of 2 3 classes. The packet is then 
forwarded by the endstation 12 or node 14 over the network 
10 to an input port 28 of a router 16 or 17. The router then 
transfers the packet through an output port 30 and over the 
network in accordance with the transmission rules and delay 
limits associated with the class of service to which the 
packet is assigned. 

The routers 17 support 2"""" classes of service while the 
routers 16 support 2" classes, where n<3. We discuss herein 
the operations of the routers 16 to assign packets to the 
various classes. Further, we discuss the operations that the 
routers 16 and/or the routers 17 perform to determine 
whether to retain or discard a packet and/or when to transmit 
a packet. 

Referring now to FIG. 2, a router 16 includes a classifier 
18 that associates a received packet with one of 2 M+m classes 
of service, based primarily on the COS tag, if any, included 
in the packet header. The classifier maps the 2 n+m classes of 
service to the 2" classes based on, for example, the highest 
order n bits or the lowest order n bits of the COS tag. The 
classifier then uses the remaining bits of the COS tag to set 
the loss priorities of the packets. As discussed below, the loss 
priorities determine if respective packets are discarded or 
retained during times of congestion. The higher the loss 
priority of a packet, the less likely the packet will be 
retained. 

If the endstation 12 or node 14 that introduces the packet 
to the network does not support the 802. lp standard, the 
COS tag may not be included in the packet. The classifier 18 
may then assign the packet to one of the 2 M+m classes, 
currently 2 3 classes, based on appropriate network or service 
provider transmission rules. It may, for example, assign the 
packet to a "best effort" class. Alternatively, the router 16 
may assign the packet to a particular class of service based 
on a media access control, or MAC, address included in the 
packet. The classifier then writes the appropriate COS tag to 
the packet header. 

Referring now also to FIG. 3, the router 16 in this 
exemplary embodiment supports four classes of service, i.e., 
2 2 classes. The classifier 18 maps each of the 2 3 classes of 
service to an appropriate one of the 2 2 classes of service 
based on the two highest order bits of the 3-bit COS tag. The 
third, or lowest order bit, is then used to assign a loss priority 
to the packet. The classifier 18 thus associates a packet that 
is assigned to class of service 010 with class of service 01 
and sets the loss priority of the packet to 0. Further, the 
classifier 18 associates a packet that is assigned to class of 
service OH with class 01 and sets the loss priority of this 
packet to 1. 
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Referring further to FIG. 4 once the classifier 18 associ- 
ates the packets with the various 2" classes of service and 
sets the loss priorities (steps 4WM02), a policer 20 enforces 
network or service provider usage parameter controls by 

5 marking, discarding or passing the packets (step 404). The 
usage parameter controls are set by a network manager or 
service provider based on, for example, levels of service 
purchased by or associated with a user. The user may, for 
example, purchase a level of service based on the transmis- 

1Q sion of a maximum number of packets per hour. If the 
number of packets being sent by the user exceeds this limit, 
the policer then marks, discards or passes the excess packets 
depending on the traffic management rules. 

If the policer 20 marks an offending packet, it assigns the 
packet to a higher loss priority within the associated class of 

15 service. This increases the likelihood that the packet will be 
discarded if the network becomes congested. In the example, 
the policer sets the lowest order bit of the COS tag to 1. If 
the packet is already assigned the highest loss priority within 
the class of service, the policer 20 either passes or discards 

20 the packet, depending on the traffic management rules. If the 
packet is passed, the policer 20 may charge the user for the 
use of excess bandwidth. 

As discussed above, the policer 20 operates in accordance 
with traffic management rules established by a network 

25 manager or service provider. In the embodiment described 
herein the policer determines if a packet exceeds an estab- 
lished limit by using a "jumping window policing scheme." 
The policer thus sets a police rate of B/T for a user, where 
B is a burst size and Tis a time interval and both B and T 

30 are set by the network manager or service provider. The 
policer then counts the number of octets received from the 
user over intervals of length T. If the count exceeds B, the 
arriving packet is marked, passed or dropped, depending on 
the enforcement mode utilized by the policer. Various limits 

35 may be set, such as, for example, limits that vary based on 
the number of limes the associated policing rate is exceeded 
by a given user and/or based on the various classes of 
service. 

A WRED processor 22 determines which of the remaining 

40 packets, i.e., the packets that the policer has not discarded, 
are to be retained in a buffer 24 that holds the packets for 
every output port 30 (steps 406-416). The use of a single 
buffer is in contrast to prior known routers that use a separate 
buffer for each output port. 

45 The WRED processor 22 utilizes a modified weighted- 
random early detection (WRED) scheme. The WRED pro- 
cessor associates with each of the 2" +m classes of service, 
"C it " two thresholds, namely, a maximum threshold MAX c . j 
and a minimum threshold MIN C/ As discussed below, the 

50 thresholds are used by the processor 22 to determine 
whether to retain or discard a given packet. 

The WRED processor 22 keeps track of an average "free 
queue" depth, which is an average number of available 
storage locations in the buffer 24. When the buffer is empty, 

55 all of the buffer storage locations are linked to the free 
queue. As packets are retained, buffer locations, which are 
generally referred to in 512 byte pages, arc removed from 
the free queue and linked to appropriate class of service per 
output port queues. When the packets are later transmitted, 

60 the buffer locations are removed from the class of service per 
output port queues and again linked to the free queue. 

Each time a packet is received, the WRED processor 22 
determines a new weighted average free queue depth 

where I is the instantaneous size of the free queue, W is the 
weighting factor and A CURA£ ^ r is the current weighted 
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average free queue depth (step 406). The weighting factor W 
is preferably selected such that multiplication is accom- 
plished by shifting the difference value Q-A CUJtRElfT ). The 
value Acurrew updated at regular intervals with the 
value of Ajye^, such as after every 64B frame time, which 
approximates the average packet arrival time. 

The WRED processor compares the weighted average 
A^^, with the MAX C . and MIN C . values associated with the 
appropriate one of the 2 n+m classes of service. If the 
weighted average exceeds the MAX C value, the WRED 
processor 22 retains the packet (step 408). If the weighted 
average falls below the MIN C , value, the WRED processor 
22 discards the packet (step 410). If, however, the average 
falls between MAX C( and MIN C( values, the WRED pro- 
cessor calculates a probability of discard, PD: 

where b C( and m Cj are the intercept and slope values associ- 
ated with the appropriate one of the 2 M+,B classes of service 
(step 412). As shown in FIG. 5, the probability of discard 
changes linearly with changes in the weighted average 
queue depth. A given packet is discarded when the prob- 
ability of discard exceeds a "random" number that is 
produced by a pseudo random generator 25 (steps 414-416). 
When the weighted average is relatively low, the probability 
of discard is larger, and thus, the packet is more likely to be 
discarded. 

The slope and intercept values m Ci and b Cf are selected 
based on trade-offs between keeping links through the router 
16 busy and reserving space in the buffer 24 to handle bursts. 
For higher classes of service the slope and intercept values 
are selected to be relatively low — such that the probability 
of discard is low over the entire range from MAX C( to MIN C( . 
The slope and intercept values for the lower classes of 
service are typically larger, reflecting the greater associated 
probability of packet loss for the class and the reservation of 
spaces in the buffer for bursts of packets assigned to the 
higher classes. The various threshold values, and slope and 
intercept values are selected such that packet order and 
probabilities of packet loss are maintained across the 2"* m 
classes of service. 

In prior known routers, implementing a WRED scheme 
required maintaining average queue depths for all of the 
classes of service queues at each of the output ports. Thus, 
for a router to support 8 classes of service over "y" output 45 which the packet is assigned. 



routers that support the same number of classes, since the 
router 16 actually supports 2" classes of service, and uses a 
single output buffer to do so. 
What is claimed is: 

1. A router for use in routing packets over a network, the 
router supporting a plurality, X, of classes of service and 
including: 

A. a plurality of input ports for receiving packets over the 
network; 

B. a plurality of output ports for transferring packets over 
the network; 

C. a classifier for assigning packets received by the input 
ports to X*Y classes of service, where * represents 
multiplication, and mapping the XY classes of service 
to the X classes of service that are supported by the 
router, the classifier assigning to the packet one of Y 
associated levels of priority, wherein each level of 
priority is associated with a different probability of 
packet loss; 

D. means for retaining the packets based on probabilities 
of discard associated with the X*Y classes of service; 
and 

E. scheduling means for transferring the packets through 
each of the output ports based on the X classes of 
service. 

2. The router of claim 1 further including a multiple 
storage location buffer for retaining packets to be transferred 
through the output ports, the buffer linking the storage 
locations that contain packets in class of service per output 

30 port queues and linking available storage locations in a free 
queue. 

3. The router of claim 2 wherein the means for retaining 
the packets further includes: 

i. means for determining a new weighted average depth 
for the free queue, and 

ii. means for determining a probability of discard for a 
given packet if the new weighted average queue depth 
falls below a predetermined maximum threshold asso- 
ciated with the class of service to which the packet is 
assigned. 

4. The router of claim 3 wherein the means for retaining 
the packets discards a given packet if the associated 
weighted average depth for the free queue falls below a 
minimum threshold associated with the class of service to 



25 



35 



40 



ports, it had to calculate average queue depths for 8*y 
separate queues. In the current router 16, the WRED pro- 
cessor calculates the average depth of a single free queue, 
regardless of the number of classes of service, 

A scheduler 26 implements a 2" class-based weighted 
round robin (WRR) scheduling scheme for each output port 
(step 418). The scheduler associates an appropriate weight- 
ing factor W c , with each class of service per output port 
queue. The scheduler de-queues W C/ packets for transfer 



5. The router of claim 4 wherein the means for retaining 
the packets calculates the probability of discard as P^-c- 
(m*A N£W ) where c is an intercept and m is a slope that is 
associated with a line that plots a weighted average free 

50 queue depth versus probability of discard for the class of 
service to which the packet is assigned, and A^ w is the 
weighted average depth of the free queue. 

6. The router of claim 5 wherein the means for retaining 
the packets calculates the new weighted average depth of the 



from the queue associated with one of the 2" classes of 55 free queue as A^^A c ^ JW£JV7 +w(l-A ctJKRCAP7 .) where w is 



service, and then de-queues ^ Q . tl packets from the Q /+1 
queue for subsequent transfer. If trie Q J+1 queue is empty, the 
scheduler de-queues an appropriate number of packets from 
the Qj+ 2 queue, and so forth. The scheduler 26 thus ensures 



a weighting factor, I represents the instantaneous depth of 
the free queue and A CWViEJirr is the current weighted aver- 
age depth of the free queue. 
7. The router of claim 6 wherein the scheduling means 



that each one of the 2" classes of service is associated with 60 selects packets for transfer based on weighting factors 



an appropriate maximum delay limit and through-put allo- 
cation. 

The class of service mapping, modified WRED scheme 
and WRR scheme in combination ensure that packets are 
transferred through the router 16 as if the router supported 
the 2 n * m classes of service. The router 16, however, requires 
less processing and storage overhead than the prior known 



65 



associated with the respective X classes of service. 

8. The router of claim 1 wherein the router supports X=2" 
classes of service and the classifier assigns packets to 
X*Y=2 rt+m classes of service. 

9. A router for use in routing packets over a network, the 
router supporting a plurality, X, of classes of service and 
including: 
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A. a plurality of input potts for receiving packets over the 
network; 

B. a plurality of output ports for transferring packets over 
the network; 

C. a multiple storage location buffer for retaining packets 
to be transferred through the output ports; 

D. means for retaining the packets based on probabilities 
of discard associated with X* Y classes of service where 
* represents multiplication; and 

E. scheduling means for transferring the packets through 
each of the output ports based on the X classes of 
service that the router supports. 

10. The router of claim 9 further including a classifier for: 
i. assigning packets received by the input ports to X*Y 

classes of service, 

11. associating the packets with the X classes of service 
thai are supported by the router, and 

iii. assigning to the packet one of Y associated levels of 
priority, wherein each level of priority is associated 
with a different probability of packet loss. 

11. The router of claim 10 wherein the means for retaining 
the packets further includes 

i. means for determining a new weighted average queue 
depth for a free queue that links available buffer storage 
locations, and 

ii. means for determining a probability of discard for a 
given packet if the new weighted average free queue 
depth falls below a predetermined maximum threshold 
associated with the class of service to which the packet 
is assigned. 

12. the router of claim 11 wherein the means for retaining 
the packets calculates the probability of discard as P,,-c- 
(n^A^w) where c is an intercept and m is a slope that are 
associated with a line that plots a weighted average free 
queue depth versus probability of discard for the class of 
service to which the packet is assigned, and is the 
weighted average depth of the free queue. 

13. The router of claim 12 wherein the means for retaining 
the packets calculates the weighted average depth of the free 
queue as A^ E ^A CUJUiE ^w(l-A CUJtKENr ) where w is a 
weighted factor, I represents the instantaneous depth of the 
free queue and ^current) is the current weighted average 
depth of the free queue. 

14. The router of claim 13 wherein the means for retaining 
the packets discards a given packet if the new weighted 
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A. receiving packets through one or more input ports; 

B. assigning packets received by the input ports to X*Y 
classes of service, where * represents multiplication, 
mapping the X* Y classes of service to the X classes of 
service that are supported by the router, and assigning 
to the packet one of Y associated levels of priority, 
wherein each level of priority is associated with a 
different probability of packet loss; 

C. retaining the packets based on probabilities of discard 
associated with the X*Y classes of service; and 

D. transferring the packets through one or more output 
ports based on the X classes of service. 

20. The method of routing packets of claim 19 further 
including in the step of retaining the packets the steps of: 

i. retaining the packets in a multiple storage location 
buffer and linking available storage locations to a free 
queue, 

ii. determining a new weighted average depth for the free 
queue, and 

iii. determining a probability of discard for a given packet 
if the new weighted average queue depth falls below a 
predetermined maximum threshold associated with the 
class of service to which the packet is assigned. 

21. The method of routing packets of claim 20 including 
in the step of retaining the packets the further step of 
discarding a given packet if the new weighted average depth 
for the free queue falls below a minimum threshold associ- 
ated with the class of service to which the packet is assigned. 

22. The method of routing packets of claim 21 wherein the 
step of retaining the packets includes calculating the prob- 
ability of discard as P^c-(m* A^^) where c is an intercept 
and m is a slope associated with a line that plots weighted 
average free queue depth versus probability of discard for 
the class of service to which the packet is assigned, and 
A NEW is the new weighted average depth of the free queue. 

23. The method of routing packets of claim 22 wherein the 
step of retaining the packets includes calculating the new 
weighted average depth of the free queue as A NEW - 
Acurr£n-i+w(1 -Ac URRE nt) where w is a weighting factor I 
represents the instantaneous depth of the free queue and 
^current is tne current weighted average queue depth. 

24. The method of claim 23 wherein the discard means 
retains a given packet of the new weighted average free 
queue depth is above a maximum threshold associated with 



average free queue depth falls below a minimum threshold 45 the class of s*™** to which **» P acket * assigned. 



associated with the class of service to which the packet is 
assigned. 

15. The router of claim 13 wherein the means for retaining 
the packets retains a given packet if the new weighted 
average free queue depth is above a maximum threshold 50 
associated with the class of service to which the packet is 
assigned. 

16. The router of claim 9 wherein the scheduling means 
selects packets for transfer through each output port based 



25. The method of routing packets of claim 19 wherein the 
step of transferring packets through the more or more output 
port transfers the packets based on weighting factors asso- 
ciated with the respective X classes of service. 

26. The method of routing packets of claim 19 wherein the 
router supports X-2" classes of service and, in the step of 
assigning packets, the packets are assigned to X*Y-2 n+m 
classes of service. 

27. A method of routing packets through a router that 



on weighting factors associated with the respective X classes 55 fupP°^ a plurality, X, of classes of service, the method 



of service. 

17. The router of claim 16 wherein the buffer links 
retained packets in class of service per output port queues 
and the scheduling means selects packets from the class of 
service per output port queues. 

18. The router of claim 9 wherein the router supports 
X"2" classes of service and the means for retaining retains 
packets bases on probabilities of discard associated with 
X*Y-2 B *" classes of service. 

19. A method of routing packets through a router that 
supports a plurality, X, of classes of service, the method 
including the steps of: 



60 



65 



including: 

A. receiving packets through one or more input ports and 
assigning the packets to X* Y classes of service, where 
* represents multiplication; 

B. retaining packets based on probabilities of discard 
associated with the X*Y classes of service in a multiple 
storage location buffer that links available storage 
locations to a free queue; and 

C. transferring the packets through one or more output 
ports based on the X classes of service. 

28. The method of routing of claim 27 further including 
the steps: 
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i. associating the packets that are assigned to the X*Y 
classes of service with the X classes of service that are 
supported by the router, and 

ii. assigning to the respective packets one of Y associated 
levels of priority, wherein each level of priority asso- 
ciated with a different probability of packet loss. 

29. The method of routing packets of claim 28 wherein the 
step of retaining the packets includes: 

a. determining a new weighted average depth for the free 
queue, and 

b. determining a probability of discard for a given packet 
if the new weighted average free queue depth falls 
below a predetermined maximum threshold associated 
with the class of service to which the packet is 
assigned. 

30. The method of routing packets of claim 29 wherein the 
step of retaining packets further includes calculating the 
probability of discard as P </ -c-(m*A JV£H ,) where c is an 
intercept and m is a slope that are associated with a line that 
plots average free queue depth versus probability of discard 
for the class of service to which the packet is assigned, and 
A N£W is the new weighted average depth of the free queue. 



10 



15 



20 



31. The method of routing packets of claim 30 wherein the 
step of retaining packets further includes calculating the new 
weighted average depth of the free queue as Aw^^- 
A c URRENT+ w 0 - Current) where w is a weighting factor, I 
represents the instantaneous depth of the free queue and 
^current is ,ne current weighted average queue depth. 

32. The method of routing packets of claim 29 wherein the 
step of retaining packets further includes discarding a given 
packet if the new weighted average free queue depth falls 
below a minimum threshold associated with the class of 
service to which the packet is assigned. 

33. The method of routing packets of claim 29 wherein the 
step of retaining packets further includes retaining a given 
packet if the new weighted average free queue depth is 
above the maximum threshold associated with the class of 
service to which the packet is assigned. 

34. The method of routing packets of claim 27 wherein the 
router supports X-2" classes of service and, in the step of 
retaining packets, the packets are retained based on prob- 
abilities of discard associated with X*Y-2" +m classes of 
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ABSTRACT 



In packet communication, a method for automatically clas- 
sifying packet flows for use in allocating bandwidth 
resources and the like by a rule of assignment of a service 
level. By rendering discoverable the attributes of a flow 
specification for packet flows, a finer grained hierarchy of 
classification is provided automatically that is based on 
information which is specific to the type of program or 
application supported by the flow and thus allowing greater 
flexibility in control over different flows within the same 
application. The method comprises applying individual 
instances of traffic classification paradigms to packet net- 
work flows based on selectable information obtained from a 
plurality of layers to define a characteristic class, then 
mapping the flow to the defined traffic class. The flow 
specification is provided with some application-specific 
attributes, some of which are discoverable. The discoverable 
attributes lead to an ability to automatically create sub-nodes 
of nodes for finer-grained control. 

17 Claims, 7 Drawing Sheets 
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METHOD FOR AUTOMATICALLY 
CLASSIFYING TRAFFIC WITH ENHANCED 
HIERARCHY IN A PACKET 
COMMUNICATIONS NETWORK 

CROSS-REFERENCES TO RELATED 5 
APPLICATIONS 

This application claims priority from a commonly owned 
U.S. Provisional Patent Application, Serial No. 60/066,864, 
filed Nov. 25, 1997, in the name of Guy Riddle and Robert io 
L. Packer, entitled "Method for Automatically Classifying 
Traffic in a Policy Based Bandwidth Allocation System." 

This is a continuation-in-part of U.S. application Ser. No. 
09/990,354 filed Nov. 23, 2001, now U.S. Pat. No. 6,457, 
051, in the name of Guy Riddle and Robert L. Packer, 15 
entitled Method For Automatically Classifying Traffic In A 
Packet Communications Network, which is a continuation 
of application Ser. No. 09/198,090 filed Nov. 23, 1998, now 
U.S. Pat. No. 6,412,000, also in the name of Guy Riddle and 
Robert L. Packer, also entitled Method For Automatically ^ 
Classifying Traffic In A Packet Communications Network. 

The following related comm only -owned U.S. patent 
application is hereby incorporated by reference in its entirety 
for all purposes: U.S. patent application Ser. No. 09/198, 
051, filed Nov. 23, 1998, still pending, in the name of Guy 
Riddle, entitled "Method for Automatically Determining a 
Traffic Policy in a Packet Communications Network." 

Further, this application makes reference to the following 
commonly owned U.S. Patents and Applications, which are 
incorporated by reference herein in their entirety for all 
purposes: 30 

U.S. Pat. No. 5,802,106, in the name of Robert L. Packer, 
entitled "Method for Rapid Data Rate Detection in a Packet 
Communication Environment Without Data Rate 
Supervision " relates to a technique for automatically deter- 
mining the data rate of a TCP connection; 35 

U.S. patent application Ser. No. 08/742,994, now U.S. 
Pat. No. 6,038,216, in the name of Robert L. Packer, entitled 
"Method for Explicit Data Rate Control in a Packet Com- 
munication Environment Without a Data Rate Supervision," 
relates to a technique for automatically scheduling TCP w 
packets for transmission; 

U.S. Pat. No. 6,046,980, in the name of Robert L. Packer, 
entitled "Method for Managing Flow Bandwidth Utilization 
at Network, Transport and Application Layers in Store and 
Forward Network," relates to a technique for automatically 45 
allocating bandwidth based upon data rates of TCP connec- 
tions according to a hierarchical classification paradigm; and 

U.S. patent application Ser. No. 08/742,994 now U.S. Pat. 
No. 6,038,216 issued Mar. 14, 2000, in the name of Robert 
L. Packer, entitled "Method for Explicit Data Rate Control 50 
in a Packet Communication Environment Without a Data 
Rate Supervision," relates to a technique for automatically 
scheduling TCP packets for transmission. 

STATEMENT AS TO RIGHTS TO INVENTIONS 
MADE UNDER FEDERALLY SPONSORED S5 
RESEARCH OR DEVELOPMENT 

NOT APPLICABLE 

REFERENCE TO A "SEQUENCE LISTING," A 
TABLE, OR A COMPUTER PROGRAM LISTING 60 
APPENDIX SUBMITTED ON A COMPACT 
DISK 

NOT APPLICABLE 

COPYRIGHT NOTICE 65 

A portion of the disclosure of this patent document 
contains material which is subject to copyright protection. 
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The copyright owner has no objection to the facsimile 
reproduction by anyone of the patent document or the patent 
disclosure as it appears in the Patent and Trademark Office 
patent file or records, but otherwise reserves all copyright 
rights whatsoever. 

BACKGROUND OF THE INVENTION 

This invention relates to digital packet 
telecommunications, and particularly to management of 
network bandwidth based on information ascertainable from 
multiple layers of OSI network model. It is particularly 
useful in conjunction with bandwidth allocation mecha- 
nisms employing traffic classification in a digitally -switched 
packet telecommunications environment, as well as in 
monitoring, security and routing. 

The ubiquitous TCP/IP protocol suite, which implements 
the world-wide data communication network environment 
called the Internet and is also used in private networks 
(Intranets), intentionally omits explicit supervisory function 
over the rate of data transport over the various media which 
comprise the network. While there are certain perceived 
advantages, this characteristic has the consequence of jux- 
taposing very high-speed packet flows and very low-speed 
packet flows in potential conflict for network resources, 
which results in inefficiencies. Certain pathological loading 
conditions can result in instability, overloading and data 
transfer stoppage. Therefore, it is desirable to provide some 
mechanism to optimize efficiency of data transfer while 
minimizing the risk of data loss. Early indication of the rate 
of data flow which can or must be supported is imperative. 
In fact, data flow rate capacity information is a key factor for 
use in resource allocation decisions. For example, if a 
particular path is inadequate to accommodate a high rate of 
data flow, an alternative route can be sought out. 

Internet/Intranet technology is based largely on the TCP/ 
IP protocol suite, where IP, or Internet Protocol, is the 
network layer protocol and TCP, or Transmission Control 
Protocol, is the transport layer protocol. At the network 
level, IP provides a "datagram" delivery service. By 
contrast, TCP builds a transport level service over the 
datagram service to provide guaranteed, sequential delivery 
of a byte stream between two IP hosts. 

TCP flow control mechanisms operate exclusively at the 
end stations to limit the rate at which TCP endpoints emit 
data. However, TCP lacks explicit data rate control. The 
basic flow control mechanism is a sliding window, super- 
imposed on a range of bytes beyond the last explicitly- 
acknowledged byte. Its sliding operation limits the amount 
of unacknowledged transmissible data that a TCP endpoinl 
can emit. 

Another flow control mechanism is a congestion window, 
which is a refinement of the sliding window scheme, which 
employs conservative expansion to fully utilize all of the 
allowable window, A component of this mechanism is 
sometimes referred to as "slow start". 

The sliding window flow control mechanism works in 
conjunction with the Retransmit Timeout Mechanism 
(RTO), which is a timeout to prompt a retransmission of 
unacknowledged data. The timeout length is based on a 
running average of the Round Trip Time (RTT) for acknowl- 
edgment receipt, i.e. if an acknowledgment is not received 
within (typically) the smoothed RTT+4mcan deviation, 
then packet loss is inferred and the data pending acknowl- 
edgment is retransmitted. 

Data rate flow control mechanisms which arc operative 
end-to-end without explicit data rate control draw a strong 
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inference of congestion from packet loss (inferred, typically, 
by RTO). TCP end systems, for example, will "back-off", 
i.e., inhibit transmission in increasing multiples of the base 
RTT average as a reaction to consecutive packet loss. 

Bandwidth Management in TCP/IP Networks 

Conventional bandwidth management in TCP/IP net- 
works is accomplished by a combination of TCP end sys- 
tems and routers which queue packets and discard packets 
when certain congestion thresholds are exceeded. The 
discarded, and therefore unacknowledged, packet serves as 
a feedback mechanism to the TCP transmitter. (TCP end 
systems arc clients or servers running the TCP transport 
protocol, typically as part of their operating system.) 

Hie term "bandwidth management" is often used to refer 
to link level bandwidth management, e.g. multiple line 
support for Point to Point Protocol (PPP). Link level band- 
width management is essentially the process of keeping 
track of all traffic and deciding whether an additional dial 
line or ISDN channel should be opened or an extraneous one 
closed. The field of this invention is concerned with network 
level bandwidth management, i.e. policies to assign avail- 
able bandwidth from a single logical link to network flows. 

In U.S. Pat. No. 6,038,216, in the name of Robert L. 
Packer, entitled "Method for Explicit Data Rate Control in 
a Packet Communication Environment Without Data Rate 
Supervision," a technique for automatically scheduling TCP 
packets for transmission is disclosed. Furthermore, in U.S. 
Pat. No. 5,802,106, in the name of Robert L, Packer, entitled 
"Method for Rapid Data Rate Detection in a Packet Com- 
munication Environment Without Data Rate Supervision," a 
technique for automatically determining the data rate of a 
TCP connection is disclosed. Finally, in a U.S. patent 
application Ser. No. 08/977,376, now U.S. Pat. No. 6,046, 
980, in the name of Robert L. Packer, entitled "Method for 
Managing Flow Bandwidth Utilization at Network, Trans- 
port and Application Layers in Store and Forward Network," 
a technique for automatically allocating bandwidth based 
upon data rates of TCP connections according to a hierar- 
chical classification paradigm is disclosed. 

Automated tools assist the network manager in configur- 
ing and managing the network equipped with the rate control 
techniques described in these copending applications. In a 
related copending application, a tool is described which 
enables a network manager to automatically produce poli- 
cies for traffic being automatically detected in a network. It 
is described in a copending U.S. patent application Ser. No. 
09/198,051, still pending in the name of Guy Riddle, entitled 
"Method for Automatically Determining a Traffic Policy in 
a Packet Communications Network," based on U.S. Provi- 
sional Patent Application Serial No. 60/066,864. The subject 
of the present invention is also a tool designed to assist the 
network manager. 

While these efforts leach methods for solving problems 
associated with scheduling transmissions, automatically 
determining data flow rate on a TCP connection, allocating 
bandwidth based upon a classification of network traffic and 
automatically determining a policy, respectively, there is no 
teaching in the prior art of methods for automatically 
classifying packet traffic based upon information gathered 
from a multiple layers in a multi-layer protocol network. 

Bandwidth has become an expensive commodity as traffic 
expands faster than resources and the need to "prioritize" a 
scarce resource becomes ever more critical. One way to 
solve this is by applying "policies" to control traffic classi- 
fied as to type of service required in order to more efficiently 
match resources with traffic. 
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Traffic may be classified by type, e.g. E-mail, web surfing, 
file transfer, at various levels. For example, to classify by 
network paradigm, examining messages for an IEEE source/ 
destination service access point (SAP) or a sub-layer access 

5 protocol (SNAP) yields a very broad indicator, i.c., SNA or 
IP. More specific types exist, such as whether an IP protocol 
field in an P header indicates TCP or UDP. Well known 
connection ports provide indications at the application layer, 
i.e., SMTP or HTTP. 

io Classification is not new. Firewall products like "Check- 
Point FireWall-1," a product of Checkpoint Software 
Technologies, Inc., a company with headquarters in Red- 
wood City, Calif., have rules for matching traffic. Prior 
bandwidth managers classify by destination. The 

15 PacketShaper, a product of Packeteer, Inc., a company with 
headquarters in Cupertino, Calif., allows a user to manually 
enter rules to match various traffic types for statistical 
tracking, i.e., counting by transaction, byte count, rates, etc. 
However, manual rule entry requires a level of expertise that 

20 limits the appeal for such a system to network savvy 
customers. What is really needed is a method for analyzing 
real traffic in a customer's network and automatically pro- 
ducing a list of the "found traffic." 

SUMMARY OF THE INVENTION 

According to the invention, in a packet communication 
environment, a method is provided for automatically clas- 
sifying packet flows for use in allocating bandwidth 
resources and the like by a rule of assignment of a service 

30 level. By rendering discoverable the attributes of a flow 
specification for packet flows, a finer grained hierarchy of 
classification is provided automatically that is based on 
information which is specific to the type of program or 
application supported by the flow and thus allowing greater 

35 flexibility in control over different flows within the same 
application. The method comprises applying individual 
instances of traffic classification paradigms to packet net- 
work flows based on selectable information obtained from a 
plurality of layers of a multi-layered communication proto- 

40 col in order to define a characteristic class, then mapping the 
flow to the defined traffic class. The flow specification is 
provided with some application-specific attributes, some of 
which are discoverable. The discoverable attributes lead to 
an ability to automatically create sub-nodes of nodes for 

45 finer-grained control. The automatic classification is suffi- 
ciently robust to classify a complete enumeration of the 
possible traffic. 

In the present invention network managers need not know 
the technical aspects of each kind of traffic in order to 

so configure traffic classes and service aggregates bundle traffic 
to provide a convenience to the user, by clarifying process- 
ing and enables the user to obtain group counts of all parts 
comprising a service. 
The invention will be better understood upon reference to 

55 the following detailed description in connection with the 
accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1A depicts a representative client server relationship 
60 in accordance with a particular embodiment of the inven- 
tion; 

FIG. IB depicts a functional perspective of the represen- 
tative client server relationship in accordance with a par- 
ticular embodiment of the invention; 
65 FIG. 1C depicts a representative internetworking envi- 
ronment in accordance with a particular embodiment of the 
invention; 
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FIG. ID depicts a relationship diagram of the layers of the 
TCP/IP protocol suite; 

FIGS. 2A-2B depict representative divisions of band- 
width; 

FIG. 3 depicts a component diagram of processes and data 5 
structures in accordance with a particular embodiment of the 
invention; and 

FIGS. 4A-4B depict flowcharts of process steps in auto- 
matically classifying traffic in accordance with a particular 
embodiment of the invention. 10 

DESCRIPTION OF SPECIFIC EMBODIMENTS 
OF THE INVENTION 

1.0 Introduction 

The present invention provides techniques to automati- 
cally classify a plurality of heterogeneous packets in a 
packet telecommunications system for management of net- 
work bandwidth in systems such as a private area network, 
a wide area network or an internetwork. Systems according ^ 
to the present invention enable network managers to: auto- 
matically define traffic classes, for which policies may then 
be created for specifying service levels for the traffic classes 
and isolating bandwidth resources associated with certain 
traffic classes. Inbound as well as outbound traffic may be ^ 
managed. Below is a definitional list of terminology used 
herein. 

List of Definitional Terms 

ADMISSIONS CONTROL A policy invoked whenever a 
system according to the invention detects that a guaranteed 30 
information rate cannot be maintained. An admissions con- 
trol policy is analogous to a busy signal in the telephone 
world. 

CLASS SEARCH ORDER A search method based upon 
traversal of a N-ary tree data structure containing classes. 35 

COMMITTED INFORMATION RATE (CIR) A rate of 
data flow allocated to reserved service traffic for rate based 
bandwidth allocation for a committed bandwidth. Also 
called a guaranteed information rate (GIR). 

EXCEPTION A class of traffic provided by the user which 40 
supersedes an automatically determined classification order. 

EXCESS INFORMATION RATE (EIR) A rate of data 
flow allocated to reserved service traffic for rate based 
bandwidth allocation for uncommitted bandwidth resources. 

FLOW A flow is a single instance of a connection or 45 
packet-exchange activity. For example, all packets in a TCP 
connection belong to the same flow, as do all packets in a 
UDP session. Allow always is associated with a traffic class. 

GUARANTEED INFORMATION RATE (GIR) A rate of 5Q 
data flow allocated to reserved service traffic for rate based 
bandwidth allocation for a committed bandwidth. Also 
called a committed information rate (CIR). 

INSIDE On the LAN side of the bandwidth management 
device. 55 

MATCHING RULE A description which is used to deter- 
mine whether a flow matches a traffic class, e.g., "inside 
service :http", which will match any flows which are con- 
nected to an HTTP server on the "inside" of the bandwidth 
management device. Also known as "traffic specifications"*. ^ 

OUTSIDE On the WAN or Internet side of the bandwidth 
management device. 

PARTITION Partition is an arbitrary unit of network 
resources. 

POLICY A rule assigned to a given class that defines how 65 
the traffic associated with the class will be handled during 
bandwidth management. 
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POLICY INHERITANCE A method for assigning poli- 
cies to flows for which no policy exists in a hierarchical 
arrangement of policies. For example, if a flow matches the 
traffic class for FTP traffic to Host A, and no corresponding 
policy exists, a policy associated with a less specific node, 
such as the traffic class which matches FTP traffic to any 
host, may be located and used. 

POLICY BASED SCALING An adjustment of a 
requested data rate for a particular flow based upon the 
policy associated with the flow and information about the 
flow's potential rate. 

SCALED RATE Assignment of a data rate based upon 
detected speed. 

SERVICE LEVEL A service paradigm having a combi- 
nation of characteristics defined by a network manager to 
handle a particular class of traffic. Service levels may be 
designated as either reserved or unreserved. 

TRAFFIC CLASS A logical grouping of traffic flows that 
share the same characteristics — such as application, 
protocol, address, or set of addresses. A traffic class is 
defined with a series of matching rules. 

TRAFFIC SPECIFICATION See "matching rule". 

URI A Universal Resource Identifier is the name of the 
location field in a web reference address. It is also called a 
URL or Universal Resource Locator 

1.1 Hardware Overview 

The method for classifying heterogeneous packets in a 
packet telecommunications environment of the present 
invention may be implemented in the C programming lan- 
guage and made operational on a computer system such as 
shown in FIG. 1A. This invention may be implemented in a 
client-server environment, but a client-server environment is 
not essential. This figure shows a conventional client-server 
computer system which includes a server 20 and numerous 
clients, one of which is shown as client 25. The use of the 
term "server" is used in the context of the invention, wherein 
the server receives queries from (typically remote) clients, 
does substantially all the processing necessary to formulate 
responses to the queries, and provides these responses to the 
clients. However, server 20 may itself act in the capacity of 
a client when it accesses remote databases located at another 
node acting as a database server. 

The hardware configurations are in general standard and 
will be described only briefly. In accordance with known 
practice, server 20 includes one or more processors 30 which 
communicate with a number of peripheral devices via a bus 
subsystem 32. These peripheral devices typically include a 
storage subsystem 35, comprised of a memory subsystem 
35a and a file storage subsystem 35b holding computer 
programs (e.g., code or instructions) and data, a set of user 
interface input and output devices 37, and an interface to 
outside networks, which may employ Ethernet, Token Ring, 
ATM, IEEE 802.3, ITU X.25, Serial Link Internet Protocol 
(SLIP) or the public switched telephone network. This 
interface is shown schematically as a "Network Interface" 
block 40. It is coupled to corresponding interface devices in 
client computers via a network connection 45. 

Client 25 has the same general configuration, although 
typically with less storage and processing capability. Thus, 
while the client computer could be a terminal or a low-end 
personal computer, the server computer is generally a high- 
end workstation or mainframe, such as a SUN SPARC 
server. Corresponding elements and subsystems in the client 
computer are shown with corresponding, but primed, refer- 
ence numerals. 
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Bus subsystem 32 is shown schematically as a single bus, is an example of an Ethernet network that interconnects host 

but a typical system has a number of buses such as a local 71, which is a SPARC workstation, which may be running 

bus and one or more expansion buses (e.g., ADB, SCSI, ISA, SUNOS operating system with hast 72, which may be a 

EISA, MCA, NuBus, or PCI), as well as serial and parallel VAX 6000 computer which may be running the VMS 

ports. Network connections are usually established through 5 operating system (formerly available from Digital Equip- 

a device such as a network adapter on one of these expansion mcnt Corporation). 

buses or a modem on a serial port. The client computer may Router 75 is a network access point (NAP) of network 70 

be a desktop system or a portable system. and network 60. Router 75 employs a Token Ring adapter 

The user interacts with the system using interface devices and Ethernet adapter. This enables router 75 to interface with 

37' (or devices 37 in a standalone system). For example, 1° the two heterogeneous networks. Router 75 is also aware of 

client queries are entered via a keyboard, communicated to the Inter-network Protocols, such as ICMP and RIP, which 

client processor 30', and thence to modem or network are described herein below. 

interface 40' over bus subsystem 32'. The query is then FIG. ID is illustrative of the constituents of the Trans- 
communicated to server 20 via network connection 45. mission Control Protocol/Internet Protocol (TCP/IP) proto- 
Similarly, results of the query are communicated from the ^ ro j ^ base layer 0 f ttje TCP/IP protocol suite is the 
server to the client via network connection 45 for output on physical layer 80, which defines the mechanical, electrical, 
one of devices 37' (say a display or a printer), or may be functional and procedural standards for the physical trans- 
stored on storage subsystem 35'. mission of data over communications media, such as, for 
FIG. IB is a functional diagram of a computer system example, the network connection 45 of FIG. 1A. The 
such as that of FIG. 1A. FIG. IB depicts a server 20, and a 20 physical layer may comprise electrical, mechanical or func- 
representative client 25 of a plurality of clients which may tional standards such as whether a network is packet switch- 
interact with the server 20 via the Internet 45 or any other ing or frame-switching; or whether a network is based on a 
communications method. Blocks to the right of the server Carrier Sense Multiple Access/Collision Detection (CSMA/ 
are indicative of the processing steps and functions which CD) or a frame relay paradigm. 

occur in the server's program and data storage indicated by 25 Overlying the physical layer is the data link layer 82. The 

blocks 35a and 356 in FIG. 1A. A TCP/IP "stack" 44 works data link layer provides the function and protocols to trans- 

in conjunction with Operating System 42 to communicate f er data between network resources and to detect errors that 

with processes over a network or serial connection attaching may occur at the physical layer. Operating modes at the 

Server 20 to Internet 45. Web server software 46 executes datalink layer comprise such standardized network topolo- 

concurrently and cooperatively with other processes in 0 gies as IEEE 802.3 Ethernet, IEEE 802.5 Token Ring, ITU 

server 20 to make data objects 50 and 51 available to X.25, or serial (SLIP) protocols. 

requesting clients. A Common Gateway Interface (CGI) Network layer protocols 84 overlay the datalink layer and 
script 55 enables information from user clients to be acted provide the means for establishing connections between 
upon by web server 46, or other processes within server 20. net works. The standards of network layer protocols provide 
Responses to client queries may be returned to the clients m operational control procedures for internetworking commu- 
the form of a Hypertext Markup Language (HTML) docu- nications and routing information through multiple heterog- 
ment outputs which are then communicated via Internet 45 enous net works. Examples of network layer protocols are 
back to the user. the Internet Protocol (IP) and the Internet Control Message 
Client 25 in FIG. IB possesses software implementing ^ Protocol (ICMP). The Address Resolution Protocol (ARP) is 
functional processes operatively disposed in its program and used to correlate an Internet address and a Media Access 
data storage as indicated by block 35a' in FIG. 1A. TCP/IP Address (MAC) for a particular host. The Routing Informa- 
stack 44', works in conjunction with Operating System 42' to ticm Protocol (RIP) is a dynamic routing protocol for passing 
communicate with processes over a network or serial con- routing information between hosts on networks. The Internet 
nection attaching Client 25 to Internet 45. Software imple- 45 Control Message Protocol (ICMP) is an internal protocol for 
menting the function of a web browser 46' executes con- passing control messages between hosts on various net- 
currently and cooperatively with other processes in client 25 works. ICMP messages provide feedback about events in the 
to make requests of server 20 for data objects 50 and 51. The network environment or can help determine if a path exists 
user of the client may interact via the web browser 46' to to a particular host in the network environment. The latter is 
make such queries of the server 20 via Internet 45 and to 5Q called a "Ping". The Internet Protocol (IP) provides the basic 
view responses from the server 20 via Internet 45 on the web mechanism for routing packets of information in the lnter- 
browser 46'. net. IP is a non-reliable communication protocol. It provides 
Network Overview a "best efforts" delivery service and does not commit net- 
FIG.1C is illustrative of the internetworking of a plurality work resources to a particular transaction, nor does it 
of clients such as client 25 of FIGS. 1A and IB and a ss Perform retransmissions or give acknowledgments, 
plurality of servers such as server 20 of FIGS. 1A and IB as The transport layer protocols 86 provide end-lo-end trans- 
described herein above. In FIG. 1C, network 60 is an port services across multiple heterogenous networks. The 
example of a prior art Token Ring or frame oriented net- User Datagram Protocol (UDP) provides a connectionless, 
work. Network 60 links host 61, such as an IBM RS6000 datagram oriented service which provides a non-reliable 
RISC workstation, which may be running the AIX operating 60 delivery mechanism for streams of information. The Trans- 
system, to host 62, which is a personal computer, which may mission Control Protocol (TCP) provides a reliable session- 
be running Windows, IBM OS/2 or a DOS operating system, based service for delivery of sequenced packets of informa- 
and host 63, which may be an IBM AS/400 computer, which uon across the Internet. TCP provides a connection oriented 
may be running the OS/400 operating system. Network 60 reliable mechanism for information delivery, 
is intemetworked to network 70 via a system gateway which 65 The session, or application layer 88 provides a list of 
is depicted here as router 75, but which may also be a network applications and utilities, a few of which are 
gateway having a firewall or a network bridge. Network 70 illustrated here. For example, File Transfer Protocol (FTP) is 
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a standard TCP/IP protocol for transferring files from one 
machine to another. FTP clients establish sessions through 
TCP connections with FTP servers in order to obtain files. 
Telnet is a standard TCP/IP protocol for remote terminal 
connection. A Telnet client acts as a terminal emulator and 
establishes a connection using TCP as the transport mecha- 
nism with a Telnet server. The Simple Network Management 
Protocol (SNMP) is a standard for managing TCP/IP net- 
works. SNMP tasks, called "agents", monitor network status 
parameters and transmit these status parameters to SNMP 
tasks called "managers." Managers track the status of asso- 
ciated networks. A Remote Procedure Call (RPC) is a 
programming interface which enables programs to invoke 
remote functions on server machines. The Hypertext Trans- 
fer Protocol (HTTP) facilitates the transfer of data objects 
across networks via a system of uniform resource indicators 
(URI). 

The Hypertext Transfer Protocol is a simple protocol built 
on top of Transmission Control Protocol (TCP). It is the 
mechanism which underlies the function of the World Wide 
Web. The HTTP provides a method for users to obtain data 
objects from various hosts acting as servers on the Internet. 

2.0 Traffic Class 

A traffic class (or "class") is broadly defined as a grouping 
of traffic flows that share the same characteristics. A traffic 
class is defined with one or more matching rules. Traffic 
classes may have the property of being directional, i.e. all 
traffic flowing inbound will belong to different traffic classes 
and be managed separately from traffic flowing outbound. 
The directional property enables asymmetric classification 
and control of traffic, i.e,, inbound and outbound flows 
belong to different classes which may be maoaged indepen- 
dent of one another. 

Traffic classes may be defined at any level of the IP 3S 
protocol as well as for other non-IP protocols. For example, 
at the IP level, traffic may be defined as only those flows 
between a specified set of inside and outside IP addresses or 
domain names. An example of such a low level traffic class 
definition would be all traffic between my network and other ^ 
corporate offices throughout the Internet. At the application 
level, traffic classes may be defined for specific URIs within 
a web server. Traffic classes may be defined having "Web 
aware" class attributes. For example, a traffic class could be 
created such as all URIs matching "MUml" for all servers, 
or all URI patterns matching "*.gif" for server X, or for 
access to server Y with URI pattern "/sales/*" from client Z, 
wherein is a wildcard character, i.e., a character which 
matches all other character combinations. Traffic class 
attributes left unspecified will simply match any value for 
that attribute. For example, a traffic class that accesses data 
objects within a certain directory path of a web server is 
specified by a URI pattern of the directory path to be 
managed, e.g. "/sales/*". 

2.1 Classifying Traffic 

The present invention provides a method for classifying 
traffic according to a definable set of classification attributes 
selectable by the manager, including selecting a subset of 
traffic of interest to be classified. The invention provides the 
ability to classify and search traffic based upon multiple 
orthogonal classification attributes. 

Traffic class membership may be hierarchical. Thus, a 
flow may be classified by a series of steps through a traffic 
class tree, with the last step (i.e., at the leaves on the 
classification tree) mapping the flow to a policy. Some 
applications may be classified by application-specific 
attributes as well. For example, web traffic may also be 
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classified by HTTP header types such as Content-Type 
(MIME type) or User-Agent. Citrix traffic may be classified 
by application name or Citrix client name. (A Citrix client 
name is the client name used under the Critix (brand) 
technique of client-server interaction. RTP (real-time 
protocol) traffic may be classified by encoding name or 
media type. 

A classification tree is a data structure representing the 
hierarchical aspect of traffic class relationships. Each node 
of the classification tree represents a class, and has a traffic 
specification, i.e., a set of attributes or characteristics 
describing the traffic associated with it. Leaf nodes of the 
classification tree may contain policies. According to a 
particular embodiment, the classification process checks at 
each level if the flow being classified matches the attributes 
of a given traffic class. If it does, processing continues down 
to the links associated with that node in the tree. If it does 
not, the class at the level that matches determines the policy 
for the flow being classified. If no policy specific match is 
found, the flow is assigned the default policy. 

In a preferred embodiment, the classification tree is an 
N-ary tree with its nodes ordered by specificity. For 
example, in classifying a particular flow in a classification 
tree ordered first by organizational departments, the 
attributes of the flow are compared with the traffic specifi- 
cation in each successive department node and if no match 
is found, then processing proceeds to the next subsequent 
department node. If no match is found, then the final 
compare is a default "match all" category. If, however, a 
match is found, then classification moves to the children of 
this department node. The child nodes may be ordered by an 
orthogonal paradigm such as, for example, "service type." 
Matching proceeds according to the order of specificity in 
the child nodes. Processing proceeds in this manner, tra- 
versing downward and from left to right in FIGS. 2A and 2B, 
which describe a classification tree, searching the plurality 
of orthogonal paradigms. Key to implementing this a hier- 
archy is that the nodes are arranged in decreasing order of 
specificity. This permits search to find the most specific class 
for the traffic before more general. 

The table below depicts exemplary components from 
which traffic classes may be built. Note that the orientation 
of the server (inside or outside) may be specified. As noted 
above, any traffic class component may be unspecified, ie., 
set to match any value. 



Components of a Traffic Class Specifier 



Inside 

(Qient or Server) 



Global 



Outside 

(Server or Client) 
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IP Address/ TCP or UDP IP Address/ 

Domain Name Service Domain Name 

Port Number e.g., WWW, Port Number 

MAC Address FTP, RealAudio, etc. MAC Address 

URI pattern for 

Web Service, 

MtMB type for 

Web Service 

IPX Service 

SNA Service 

LAT Service 

IP precedence 

Application Specific Attributes 



As an example, FIGS. 2 A and 2B depict representative 
classifications of traffic made by a hypothetical network 
manager in order to accomplish particular allocations of 
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bandwidth. In FIG. 2A, the network manager has decided to 
divide her network resources first by allocating bandwidth 
between Departments A and B. FIG. 2A shows the resulting 
classification tree 201, in which Department A bandwidth 
resources 202 and Department B bandwidth resources 204 
each have their own nodes representing a specific traffic 
class for that department. Each traffic class may have one or 
more rules which arc used for determining whether a flow 
matches that traffic class. For example, in FIG. 2A, the 
Department A resources node 202 has the matching rule 
Inside Host Subnet A associated with it. Next, the network 
manager has chosen to divide the bandwidth resources of 
Department A between two applications. She allocates an 
FTP traffic class 206 and a World Wide Web server traffic 
class 208. Each of these nodes may have a separate policy 
attribute associated with them. For example, in FIG. 2 A, the 
FTP node 206 for has an attribute Outside port 20 associated 
with it. Similarly, the network manager has chosen to divide 
network bandwidth resources of Departnent B into an FTP 
server traffic class 210 and a World Wide Web server traffic 
class 212. Each may have their own respective policies. 

FIG. 2B shows a second example 203, wherein the 
network manager has chosen to first divide network band- 
width resource between web traffic and TCP traffic. She 
creates three traffic nodes, a web traffic node 220, a TCP 25 
traffic node 224 and a default node 225. Next, she divides the 
web traffic among two organizational departments by cre- 
ating a Department A node 226, and a Department B node 
228. Each may have its own associated policy. Similarly, she 



3.1.1 Service Aggregates 

A service aggregate is provided for certain applications 
that use more than one connection in a particular conversa- 
tion between a client and a server. For example, an FTP 
client in conversation with an FTP server employs a com- 
mand channel and a data transfer channel, which are distinct 
TCP sessions on two different ports. In cases where two or 
three TCP or UDP sessions exist for each conversation 
between one client and one server, it is useful to provide a 
common traffic class i.e., the service aggregate, containing 
the separate conversations. In practice, these types of con- 
versations are often between the same two hosts, but using 
different ports. According to the invention, a class is created 
with a plurality of matching rules, each matching various 
component conversations. 

3.1.2 Classification Under Specified Criterion 

Classification of traffic into a tree is performed by tra- 
versing the tree of traffic classes, starting at the root and 
20 proceeding through each child of the root, comparing the 
flow being classified against the matching rules associated 
with each traffic class. The flow is defined as "matching" a 
class if its characteristics match any one of the matching 
rules that is used to define the class. When the flow matches 
a class, then if that traffic class has children, the flow will be 
compared against each of the children to determine if there 
is a more specific match — otherwise, the processing stops 
and the flow is assigned to that traffic class. A marker is 
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,. . , , , , , . ... . . * , «: 1 placed in match_all default nodes so that when match 

divides TCP network bandwidth into separate traffic classes ^ * &i reaches me mark the classification processing 

by creating a Department A node 230 and a Department B depicted in flowchart 403 terminates, and the flow b 
node 232. Each represents a separate traffic class which may J {q ^ ^ ^ was reached 

have its own policy. & 

All traffic which does not match any user specified traffic 3.1 .3 Default Suggested Policies 
class falls into an automatically created default traffic class 35 
which has a default policy. In FIG. 2A, the default category 
is depicted by a default node 205, and in FIG. 2B, the default 
category is depicted by a default node 225. 

3.1 Traffic Discovery 



A default policy may be suggested or, in select 
embodiments, automatically applied, to a traffic class which 
has been discovered. Applying suggested or default policies 
for a new class at a user's option is described in a copending, 
commonly owned, U.S. patent application Sen No. 09/198, 



Network traffic is classified under existing classes, begin- 40 051, entitled, "Method for Automatically Determining a 



ning with the broadest classes, in inbound and outbound 
traffic classes which are protocol layer independent. For 
example, a particular instance of traffic may be classified 
according to its transport layer characteristics, e.g., Internet 
Protocol port number, as well as its application layer 45 
information, e.g., SMTP. In addition to application layer 
information, there may be attributes of the application upon 
which traffic classification can be based. For example, with 
SMTP, an attribute might be the length of the message or 
header information of the message or whether attachments 
are present. For an end-user application, such as a Lotus 
Notes database manager, the type of data might also be the 
basis of traffic classification. The key is that the flow 
specification must have some of its attributes discoverable in 
the course of the classification processing. Characteristics 
such as MIME types may also be discovered. Standard 
protocols, such as, IPX, SNA, and services, such as, SMTP 
and FTP are recognized for discovery. Classification is 
performed to the most specific level determinable. For 
example, in select embodiments, non-IP traffic, such as 
SNA, may be classified only by protocol, whereas within 
Internet Protocol, TCP or UDP traffic may be classified to 
the service level as indicated in the "/etc/services'* file. 
Classification beyond a determined terminal classification 
level is not performed. For example, in a select embodiment, 
a class matching "ipx" or "nntp" will not be further classi- 
fied. 
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Traffic Policy in a Packet Communications Network," which 
is incorporated herein by reference in its entirety for all 
purposes. 

3.1.4 Analysis of Data in Determining Traffic Class 

In a preferable embodiment, classification can extend to 
examination of the data contained in a flow's packets. 
Certain traffic may be distinguished by a signature even if it 
originates with a server run on a non-standard port, for 
example, an HTTP conversation on port 8080 would not be 
otherwise determinable as HTTP from the port number. 
Further analysis of the data is conducted in order to deter- 
mine classification in instances where: 1) FTP commands 
are used to define server ports, 2) HTTP protocol is used for 
non-web purposes. The data is examined for indication of 
push traffic, such as PointCast Network-type traffic (a type of 
traffic marketed by InfoGate of San Diego, Calif.), which 
uses HTTP as a transport mechanism. These uses may be 
isolated and classified into a separate class. Marimba and 
PointCast can be distinguished by looking into the data for a 
signature content header in the get request. PointCast has 
URLs that begin with "/FIDO-l/." Other applications in 
which protocol can be inferred from data include Telnet 
traffic. Both tn3270 and tn3270E (emulation) may be 
detected by looking into data and given a different class. 
Telnet traffic has option negotiations which may indicate an 
appropriate class. 
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3.1.5 [dentity of Traffic Based Upon Resource Creator's 
Class 

A flow's traffic class may be inferred by observing the 
existence of other flows that are known to be related, 
between the same two nodes as the flow being classified. 
This method is used to detect Real Time Protocol (RTP) for 
point-to-point telephony, RTP for broadcast streaming, 
CCITT/ITU, H323-internet telephony over the Internet 
(bidirectional) and RTSP real time streaming protocol 
(unidirectional). 

3.1.6 Dynamic Ports 

Some applications establish connections to a well-known 
port number. Other applications use dynamic ports, by first 
connecting to a well-known port number, and then being 
redirected to another port number which is random or 
dynamically generated. For example, in a database 
application, a client may connect to the database server's 
well-known port number. At (his location, a load-balancing 
server may be running which is aware of all of the other port 
numbers that are listened to by instances of the database 
application. The load-balancing server will redirect the 
client to the port number of the least-loaded database server 
instance. 

3.2 Traffic Discovery Processing 

FIG. 3 depicts components of a system for discovering 
traffic according to the invention. A traffic tree 302 is 
provided in which new traffic will be discovered under a 
particular member class node. The traffic tree may have a 
hierarchy of nodes (Class A,B C) and corresponding sub- 
nodes under the nodes. A traffic classifier 304 detects ser- 
vices for incoming traffic. Alternatively, the classifier may 
start with a service and determine the hosts using it. A 
knowledge base 306 contains heuristics for determining 
traffic classes. The knowledge base may be embodied in 
memory, file, executable code, or a database. In a preferred 
embodiment, the knowledge is contained within a dala 
structure resident in memory, and in executable code. A 
plurality of saved lists 308 stores identifying characteristics 
of classified traffic pending incorporation into traffic tree 
302. In select embodiments, entries for each instance of 
traffic may be kept in one of the plurality of saved lists, each 
af which is associated with a traffic class which is marked to 
indicate that discovery is enabled on it. If there are attributes 
that are specific to an application, the entries may contain 
such attributes. In alternate embodiments, a copy of an entry 
and a count of duplicate copies for the entry is maintained. 

FIG. 4A depicts a flowchart 401 of processing steps for 
discovering traffic. In a step 402, a flow specification is 
parsed from the flow being classified. The flow specification 
may include attributes of the application associated with the 
flow. Then in a step 404, the flow specification parsed from 
the flow in step 402 is compared with the traffic specifica- 
tions in each node of the classification tree. Rules are 
checked starting from most specific to least specific. In a 
decisional step 406, a determination is made if traffic 
matches one of the classes that are marked for discovery. If 
this is so, then in a step 408, an entry is made in a list of 
identifying characteristics, such as protocol type, IP protocol 
number, server port, traffic type if known, application- 
specific attributes, or a time of occurrence of the traffic. In 
an optional step 410, duplicate instances having the same 
identifying characteristics are suppressed, in favor of keep- 
ing a count of the duplicates and a most recent time traffic 
with these identifying characteristics was encountered. In an 
optional step 412, a byte count of traffic of this type has been 
detected is included. It should be noted that as a result of the 
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traffic classification process, a flow will always match some 
traffic class. If it does not match anything specific, it will 
match a match-all (default) class. Also, it should be noted 
that if a flow matches a class that is marked for discovery, 
information about the flow will always be recorded in the list 
of saved characteristics. 

In a preferred embodiment, processing according to flow- 
chart 401 may execute on multiple instances of saved list 
308. 

3.2.1 Displaying Results to a User 

In an optional step 413 (not shown), after the processing 
of flowchart 401 completes or at periodic intervals or on 
demand, a list of traffic classes produced in steps 402 
through 412 is displayed to a network manager. The list may 
be sorted by any well-known criteria such as: 1) most "hits" 
during a recent interval, 2) most recently-seen (most recent 
time first), 3) most data transferred (bytes/second) during 
some interval, or a moving average. The user may choose an 
interval length or display cutoff point (how many items, how 
recent, at least B bytes per second, or other thresholds). The 
Network manager may then take some action (e.g. pushing 
a button) to select the traffic types she wishes to add to the 
classification tree. The display can be hierarchical, as 
depicted in lines (3) below: 

FTP (3) 

FTP-cmd 

FTP-data 

Lotus 

Lotus_database 1 

Lotus_database 2 

Lotus_database 2_video 

to-hostl 

tcp 

FTP 

FTP-cmd 

FTP-data 

HTTP 

images 

java 

text • 

TCP-port-9999 

wherein the "TCP-port-9999" entry is a traffic class which 
was discovered as a result of an application which was 
making repeated or simultaneous connections to TCP port 
9999, and for which there was no other information avail- 
able to allow matching on a specific class for that application 
(it was not an application known in the knowledge base.) 

The italicized terms are examples of sub -nodes with 
application-specific characteristics. 

In a related embodiment, a threshold for display or class 
creation of well-known traffic types is provided. 

3.2.2 Interval Based Incorporation 

In an alternative embodiment, at select intervals of time, 
items in the saved list of traffic characteristics are analyzed, 
and either 1) recognized and a corresponding traffic class is 
added to the tree, or 2) (for repeated attempts to request a 
server connection port, IP subprotocol type, or ethertype that 
is not otherwise known in the knowledge base, upon exceed- 
ing a certain threshold) a class for the traffic is created and 
added to the classification tree. 

FIG. 4B depicts a flowchart 403 of the processing steps 
for integrating traffic classes into a classification tree in an 
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embodiment. Processing steps of flowchart 403 create new In a related embodiment, another method of identifying 
classes in the classification tree, based upon the information an individual traffic class is to detect simultaneous connec- 
saved in the lists of traffic characteristics. This is done tions to the same host port from different clients. This 
periodically at a denned interval of seconds, such as 30 provides an indication that the port is a well-known con- 
second in a preferred embodiment. In a step 420, an instance 5 nection port 

of saved traffic is retrieved from the saved traffic list 308. Traffic &K for combination of the 

Thereafter, a test is made to determine if discovery is . , . . „ . , , . ~- 

occurring under a class for which discovery of attributes ™*»™d categories. A flag is added to all traffic 

occurs (that is, a class which corresponds to an application ^ S ° cr * ated m order lo mdicate ,l 15 P roduct 

which has appUcation-specific attributes which may be ot aut0 cusaner. 
discovered), (step 421). If true, then the process proceeds 3.2 Command Language Interface: 
direcdy to the step of creating a node for a traffic class for \ n a particular embodiment, function of the classifier 304 

the next attribute that was saved (step 424). Elements on the ^ controlled by a command language interface. Below is a 

list of discoverable attributes are assigned a processing plurality of command language interface commands, 
priority relative to one another. It is a characteristic of the 4 .. , . ~* — ... t . .,, . e 

product that the processing priority ("ordering") may be « setu P f*™* M°*> To acUvate autoclassification for 

pre-defined or may be configurable by the user (While the various classes t0 detect well-known protocols and services: 
characteristic of order is a part of the invention, the media- class discover <tclass>{inside/outside/both} To turn on 

nism for fixing or for reconfiguring the ordering is not a part autoclassification (a.k.a. discovery) under a class to detect 

of this invention.) If the process is not in a state of disco v- services with the host on the inside, the outside, or both 

ering attributes, then in an alternative decisional step 422, 20 directions. 

the instance of saved traffic is examined to determine class discover <tclass> off To turn off use. 

whether it is well-known (e.g., protocol type, assigned port ~ . . ■ ,i_ * .^i-ui 

1 \ . J - . - \ t7\i_- ■ Ine new classes have names in the format or lines below: 

number) and a name representing its type exists. If this is so 

then processing continues with a test of whether the saved <service> or 
traffic belongs to a service aggregate in step 426. Otherwise, 25 <protocol>_Part_<number> or 
in a step 423 the instance of saved traffic is examined to <service>_<attribute>[_<attribute>_<attribute> . . . ] 
determine whether it appears to be a server connection port , , . ... „„„ T1 „„ 

c j in . / .l * l. . t. where <protocol> is either TCP or UDP. 

of an unregistered IP port (or a port that has not been r 

configured). If this is not so then, processing continues with lf a heretofore unknown server-connection port appears to 

the next traffic class in the saved list in step 420. In x be "well used", an entry of the second type is created. The 

decisional step 426, the instance of saved traffic is examined threshold for creation is for example 11 hits with no more 

to determine whether it belongs to a service aggregate. For &™ 1 mioutc (granularity of checking is at least 30 seconds 

example, an FTP session has one flow that is used to between running successive discovery or autoclassification 

exchange commands and responses and a second flow that processes) between any two hits. For example: 

is used to transport data files. If the traffic does belong to a 35 inbound/inside/ftp 

service aggregate, then in a step 428, a traffic class is created http 

which will match all components of the service aggregate. In sna 

a step 425, a new traffic class is created to match the instance Qrt 

of saved traffic. ~ " ~~ 

In an optional step (not shown), a suggested policy is 40 Lotus_database2_video 
determined for the traffic class created in step 425. Next, in 3 3 Svn,ax of Traffic Specifications (a.ka. Matching 

a decisional step 432, a limit is checked to verify that the Rules): 

total number of traffic classes has not exceeded a specified Flow specifications and traffic specifications may have an 

maximum. If the limit on classes has not been reached, then inside service field and an outside service field. (For some 

the traffic is checked to determine if there are still attributes 45 protocols or service types, inside and outside are not 

for which classes have not been discovered (step 434) and if distinguished.) Each will have values of S VC_UNKNO WN 

so, then the attributes arc retained in the list (step 436). In (0), SVC_CLIENT (1), or a number greater than 1, which 

either case, the process is repeated from step 420. is the service index, an index into the global table gServic- 

In a related embodiment in place of steps 424, 425 or 428, eTable. If a type of service is known for a connection, the 

a display of traffic classes, sorted by most recently used, 50 service field at a particular side will be set to SVC_CUENT 

most hits, number of bytes received during any interval, and lhe service field at the opposite side will be the index 

which is determined by a plurality of time stamps, is into gServiceTable. If a type of service is not known for the 

available on demand to a network manager. The network t^c, both inside service field and outside service field will 

manager then manually indicates that the traffic is to be be SVC_UNKNOWN. A person of reasonable skill in the 

added to the tree. 55 art appreciate that other embodiments for the table, such 

In a particular embodiment a threshold is employed to ^ representing the information contained therein as text 

determine traffic for which a separate class should be added. strin S s or b y an y one of a plurality of possible encoding 

A minimum usage threshold indicates whether a particular schemes, are realizable without departing from the present 

port has been used at least n times in the last s seconds. (This invention. 

applies only in those instances where there is an identifiable 60 Therefore, a traffic specification can have "outside ser- 

port.) If traffic is well known, i.e., SMTP, a new traffic class vice:http" (or just "outside HTTP") which is different than 

is created immediately, i.e., threshold is equal to one hit per "outside tcp port:80". The first will match HTTP on any port 

minute; otherwise, the threshold is set equal to a fixed, while the second will match anything on port 80 (including 

arbitrarily-configured value, for example, two to ten thou- PointCast and Marimba). 

sand hits per minute. A new class for traffic is given a generic 65 Specifying an aggregate traffic specification 

name, e.g., TCP-Port-99. Entries for traffic over a certain "service :<agg>" identifies the traffic specifications for vari- 

age, for example one minute old, are discarded. ous traffic belonging to the service. Specifying "class new 
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inbound cdls outside dls" is the same as "class new inbound 
cdls outside service:dJs-wpn" and "class tspec add cdls 
outside service: dls-rpn". Most services which are known in 
the knowledge base will create a class that encompasses all 
of the members of the aggregate. 5 

Network managers need not be aware of services which 
are known to be derivative of others, e.g., PointCast and 
Marimba are special cases of HTTP and tn3270 is a special 
case of Telnet, in order to work with the system. 

4.0 Conclusion 10 

In conclusion, the present invention provides for an 
automatic determination of a policy for a packet telecom- 
munications system wherein bandwidth is allocated to 
requesting flows according to automatically determined 
application requirements. An advantage of traffic classifica- J5 
tion techniques according to the present invention is that 
network managers need not know the technical aspects of 
each kind of traffic in order to configure traffic classes. A 
further advantage of the present invention is that traffic 
classes may include application-specific attributes such as a ^ 
MIME type for web traffic. 

Other embodiments of the present invention and its indi- 
vidual components will become readily apparent to those 
skilled in the art from the foregoing detailed description. As 
will be realized, the invention is capable of other and ^ 
different embodiments, and its several details are capable of 
modifications in various obvious respects, all without 
departing from the spirit and the scope of the present 
invention. Accordingly, the drawings and detailed descrip- 
tion are to be regarded as illustrative in nature and not as 3Q 
restrictive. It is therefore not intended that the invention be 
limited except as indicated by the appended claims. 

What is claimed is: 

1. A method for automatically classifying traffic in a 
packet communications network, said network having any 35 
number of flows, including zero, comprising the steps of: 

parsing a packet into a first flow specification, wherein 
said first flow specification contains at least one 
instance of any one of the following: 
a protocol family designation, ^ 
a direction of packet flow designation, 
a protocol type designation, 
a pair of hosts, 
a pair of ports, 

in HTTP protocol packets, a pointer to a MIME type; 45 
thereupon, 

matching the first flow specification of the parsing step 
to a plurality of classes represented by a plurality of 
nodes, each node having a traffic specification; 
thereupon, 50 

if a matching node was not found in the matching step, 
associating said first flow specification with one or 
more newly-created nodes; thereupon, 

incorporating said newly-created node into said plural- 
ity of nodes. S5 

2. A method for automatically classifying traffic in a 
packet communications network, said network having any 
number of flows, including zero, comprising the steps of: 

determining application type of a flow; thereafter 
for said application type of said flow, parsing a packet of 60 
said flow into a first flow specification, said first flow 
specification containing information as attributes, said 
attributes being specific to said application type and 
wherein selected ones of said attributes are discover- 
able; thereupon es 
matching the first flow specification of the parsing step to 
a plurality of classes represented by a plurality of nodes 
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of a classification tree type, each said classification tree 
type node having a traffic specification; thereupon 

if a matching classification tree type node was found in 
the matching step and said matching classification tree 
type node indicates that further nodes can be created as 
a consequence of attributes thereof that are 
discoverable, then 

creating at least one new classification tree type node; 
thereupon 

associating said first flow specification with said at least 
one newly -created classification tree type node; and 
thereupon 

incorporating said at least one newly-created classifica- 
tion tree type node into said plurality of classification 
tree type nodes so that policies can be applied to traffic 
based only on said discoverable attributes of said at 
least one newly-created classification tree type node. 

3. The method of claim 2 wherein said discoverable 
attributes are assigned a processing priority relative to one 
another. 

4. The method of claim 2 further comprising the steps of: 
for at least a second flow having a second flow 

specification, recognizing said second flow specifica- 
tion and said first flow specification to comprise 
together a service aggregate; thereupon 
associating said first flow specification and said second 
flow specification with a newly-created classification 
tree node, said newly-created classification tree type 
node having a first traffic specification corresponding to 
said first flow specification and a second traffic speci- 
fication corresponding to said second flow specifica- 
tion. 

5. The method of claim 2 farther comprising the steps of: 
applying policies from said newly-created classification 

tree type nodes to instances of detected traffic. 

6. The method of claim 2 further comprising the steps of: 
for a subclassification under a specified criterion com- 
prising a specified attribute name and a value, if a 
matching classification tree type node was found in the 
matching step, said matching classification tree type 
node having at least one child classification tree type 
node, applying the matching, associating, and incorpo- 
rating steps to a particular child classification tree type 
node of said matching classification tree type node as a 
part of classification. 

7. The method of claim 2 wherein the parsing step further 
comprises the steps of: 

examining data contained within a plurality of component 
packets belonging to said first flow for any number of 
a plurality of indicators of any of the following: 

a protocol; 

a service; thereupon, matching said plurality of indicators 
to said classes represented by a plurality of said clas- 
sification tree type nodes. 

8. The method of claim 2 further including measuring 
traffic load and invoking said classification upon achieve- 
ment of a minimum usage threshold. 

9. The method according to claim 2 wherein said match- 
ing step is applied to hierarchically- recognized classes. 

10. A system for automatically classifying traffic in a 
packet telecommunications network, said network having 
any number of flows, including zero, comprising: 

a plurality of network links upon which said traffic is 
carried; 

a network routing means; and 
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a processor means operative to: 

determine application type of a flow; 

for said application type of said flow, parse a packet of 
said flow into a first flow specification, said first flow 
specification containing information as attributes, said 5 
attributes being specific to said application type and 
wherein selected ones of said attributes are discover- 
able; thereupon 

match the first flow specification of the parsing step to a JQ 
plurality of classes represented by a plurality of nodes 
of a classification tree type, each said classification tree 
type node having a traffic specification; thereupon 

if a matching classification tree type node was found in 
the matching step and said matching classification tree 15 
type node indicates that further nodes can be created as 
a consequence of attributes thereof that are 
discoverable, (hen 

associate said first flow specification with said at least one 
newly-created classification tree type node; thereupon 20 

create at least one new classification tree type node; and 
thereupon 

incorporate said at least one newly-created classification 
tree type node into said plurality of classification tree 
type nodes so that policies can be applied to traffic 25 
based only on said discoverable attributes of said at 
least one newly-created classification tree type node. 

11. The system of claim 10 wherein said processor means 
is further operative to include measuring traffic load and 
invoking said classification upon achievement of a mini- 30 
mum usage threshold. 

12. The system according to claim 10 wherein said 
processor means is further operative to apply said matching 
step to hierarchically-recognized classes. 
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13. A method for classifying traffic in a packet telecom- 
munications network, said network having any number of 
flows, including zero, said method comprising the steps of: 

parsing a packet into a first flow specification, said first 
flow specification having discoverable attributes; 
thereupon, 

matching the first flow specification of the parsing step to 
a plurality of classes represented by a plurality of 
classification tree type nodes, each said classification 
tree type node having a traffic specification; thereupon, 

if a matching classification tree type node was found in 
the matching step and said matching classification tree 
type node indicates through said discoverable attributes 
that further nodes can be created, creating at least one 
new classification tree type node; thereupon 

associating said first flow specification with at least one 
more newly-created node; thereupon, 

displaying to a network administrator a representation of 
traffic according to said traffic specification for use in 
manual intervention. 

14. The method according to claim 13 further including 
the step of sorting said traffic representation according to 
most recently occurring. 

15. The method according to claim 13 further including 
the step of sorting said traffic representation according to 
most data transferred for a preselected period of time. 

16. The method of claim 13 further including measuring 
traffic load and invoking said classification upon achieve- 
ment of a minimum usage threshold. 

17. The method according to claim 13 wherein said 
matching step is applied to hierarchically-recognized 
classes. 

* * * + + 
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ABSTRACT 



In a packet communication environment, a method is pro- 
vided for automatically classifying packet flows for use in 
allocating bandwidth resources by a rule of assignment of a 
service level. The method comprises applying individual 
instances of traffic classification paradigms to packet net- 
work flows based on selectable information obtained from a 
plurality of layers of a multi-layered communication proto- 
col in order to define a characteristic class, then mapping the 
flow to the defined traffic class. It is useful to note that the 
automatic classification is sufficiently robust to classify a 
complete enumeration of the possible traffic. 

15 Claims, 7 Drawing Sheets 
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METHOD FOR AUTOMATICALLY 
CLASSIFYING TRAFFIC IN A PACKET 
COMMUNICATIONS NETWORK 

CROSS-REFERENCES TO RELATED 
APPLICATIONS 

This application claims priority from a commonly owned 
U.S. Provisional Patent Application, Ser. No. 607066,864, 
filed on Nov. 25 1997, in the name of Guy Riddle and Robert 
L. Packer, entitled "Method for Automatically Classifying 
Traffic in a Policy Based Bandwidth Allocation System." 

The following related commonly-owned 
contemporaneously- filed co-pending U.S. Patent Applica- 
tion is hereby incorporated by reference in its entirety for all 
purposes: U.S. patent application Scr. No. 09/198,051, still 
pending, in the name of Guy Riddle, entitled "Method for 
Automatically Determining a Traffic Policy in a Packet 
Communications Network,". 

COPYRIGHT NOTICE 

A portion of the disclosure of this patent document 
contains material which is subject to copyright protection. 
The copyright owner has no objection to the facsimile 
reproduction by anyone of the patent document or the patent 
disclosure as it appears in the Patent and Trademark Office 
patent file or records, but otherwise reserves all copyright 
rights whatsoever. 

Further, this application makes reference to the following 
commonly owned U.S. Patent Application, which are incor- 
porated by reference herein in their entirety for all purposes: 
U.S. Pat. No. 5,802,106, in the name of Robert L. Packer, 
entitled "Method for Rapid Data Rate Detection in a 
Packet Communication Environment Without Data 
Rate Supervision," relates to a technique for automati- 
cally determining the data rate of a TCP connection; 
U.S. patent application Ser. No. 08/977,376, now U.S. 
Pat. No. 6,046,980, in the name of Robert L. Packer, 
entitled "Method for Managing Flow Bandwidth Uti- 
lization at Network, Transport and Application Layers 
in Store and Forward Network," relates to a technique 
for automatically allocating bandwidth based upon data 
rates of TCP connections according to a hierarchical 
classification paradigm; and. 
U.S. patent application Ser. No. 08/742,994, now U.S. 
Pat. No. 6,038,216 in the name of Robert L. Packer, 
entitled "Method for Explicit Data Rate Control in a 
Packet Communication Environment Without a Data 
Rate Supervision," relates to a technique for automati- 
cally scheduling TCP packets for transmission. 

BACKGROUND OF THE INVENTION 

This invention relates to digital packet 
telecommunications, and particularly to management of 
network bandwidth based on information ascertainable from 
multiple layers of OSI network model. It is particularly 
useful in conjunction with bandwidth allocation mecha- 
nisms employing traffic classification in a digitally-switched 
packet telecommunications environment, as well as in 
monitoriing, security and routing. 

The ubiquitous TCP/IP protocol suite, which implements 
the world-wide data communication network environment 
called the Internet and is also used in private networks 
(Intranets), intentionally omits explicit supervisory function 
over the rate of data transport over the various media which 
comprise the network. While there are certain perceived 
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advantages, this characteristic has the consequence of jux- 
taposing very high-speed packet flows and very low-speed 
packet flows in potential conflict for network resources, 
which results in inefficiencies. Certain pathological loading 

5 conditions can result in instability, overloading and data 
transfer stoppage. Therefore, it is desirable to provide some 
mechanism to optimize efficiency of data transfer while 
minimizing the risk of data loss. Early indication of the rate 
of data flow which can or must be supported is imperative. 

10 In fact, data flow rate capacity information is a key factor for 
use in resource allocation decisions. For example, if a 
particular path is inadequate to accommodate a high rate of 
data flow, an alternative route can be sought out. 
Internet/Intranet technology is based largely on the TCP/ 

15 IP protocol suite, where IP, or Internet Protocol, is the 
network layer protocol and TCP, or Transmission Control 
Protocol, is the transport layer protocol. At the network 
level, IP provides a "datagranV'deLivery service. By contrast, 
TCP builds a transport level service over the datagram 

20 service to provide guaranteed, sequential delivery of a byte 
stream between two IP hosts. 

TCP flow control mechanisms operate exclusively at the 
end stations to limit the rate at which TCP endpoints emit 
data. However, TCP lacks explicit data rate control. The 

25 basic flow control mechanism is a sliding window, super- 
imposed on a range of bytes beyond the last explicitly- 
acknowledged byte. Its sliding operation limits the amount 
of unacknowledged transmissible data that a TCP endpoint 
can emit. 

30 Another flow control mechanism is a congestion window, 
which is a refinement of the sliding window scheme, which 
employs conservative expansion to fully utilize all of the 
allowable window. A component of this mechanism is 
sometimes referred to as "slow start". 

35 The sliding window flow control mechanism works in 
conjunction with the Retransmit Timeout Mechanism 
(RTO), which is a timeout to prompt a retransmission of 
unacknowledged data. The timeout length is based on a 
running average of the Round Trip Time (RTT) for acknowl- 

40 edgment receipt, i.e. if an acknowledgment is not received 
within (typically) the smoothed RTT+4*mean deviation, 
then packet loss is inferred and the data pending acknowl- 
edgment is retransmitted. 

Data rate flow control mechanisms which are operative 

45 end-to-end without explicit data rate control draw a strong 
inference of congestion from packet loss (inferred, typically, 
by RTO). TCP end systems, for example, will "back-off", 
i.e., inhibit transmission in increasing multiples of the base 
RTT average as a reaction to consecutive packet loss. 

so Bandwidth Management in TCP/IP Networks 

Conventional bandwidth management in TCP/IP net- 
works is accomplished by a combination of TCP end sys- 
tems and routers which queue packets and discard packets 
when certain congestion thresholds are exceeded. The 

55 discarded, and therefore unacknowledged, packet serves as 
a feedback mechanism to the TCP transmitter. (TCP end 
systems are clients or servers running the TCP transport 
protocol, typically as part of their operating system.) The 
term "bandwidth management" is often used to refer to link 

60 level bandwidth management, e.g. multiple line support for 
Point to Point Protocol (PPP). Link level bandwidth man- 
agement is essentially the process of keeping track of all 
traffic and deciding whether an additional dial line or ISDN 
channel should be opened or an extraneous one closed. The 

65 field of this invention is concerned with network level 
bandwidth management, i.e. policies to assign available 
bandwidth from a single logical link to network flows. 
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In a copending U.S. patent application Ser. No. 08/742, 
994, now U.S. Pat. No. 6,038,216, in the name of Robert L. 
Packer, entitled "Method for Explicit Data Rate Control in 
a Packet Communication Environment Without Data Rate 
Supervision," a technique for automatically scheduling TCP 
packets for transmission is disclosed. Furthermore, in U.S. 
Pat. No. 5,802,106, in the name of Robert L. Packer, entitled 
"Method for Rapid Data Rate Detection in a Packet Com- 
munication Environment Without Data Rate Supervision," a 
technique for automatically determining the data rate of a 
TCP connection is disclosed. Finally, in a copending U.S. 
Pat. application Ser. No. 08/977,376, now abandoned, in the 
name of Robert L. Packer, entitled "Method for Managing 
Flow Bandwidth Utilization at Network, Transport and 
Application Layers in Store and Forward Network," a tech- 
nique for automatically allocating bandwidth based upon 
data rates of TCP connections according to a hierarchical 
classification paradigm is disclosed. 

Automated tools assist the network manager in configur- 
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analyzing real traffic in a customer's network and automati- 
cally producing a list of the "found traffic." 

SUMMARY OF THE INVENTION 

According to the invention, in a packet communication 
environment, a method is provided for automatically clas- 
sifying packet flows for use in allocating bandwidth 
resources and the like by a rule of assignment of a service 
level. The method comprises applying individual instances 
of traffic classification paradigms to packet network flows 
based on selectable information obtained from a plurality of 
layers of a multi-layered communication protocol in order to 
define a characteristic class, then mapping the flow to the 
defined traffic class. It is useful to note that the automatic 
classification is sufficiently robust to classify a complete 
enumeration of the possible traffic. 

In the present invention network managers need not know 
the technical aspects of each kind of traffic in order to 



ing and managing the network equipped with the rate control 20 configure traffic classes and service aggregates bundle traffic 
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techniques described in these copending applications. In a 
related copending application, a tool is described which 
enables a network manager to automatically produce poli- 
cies for traffic being automatically detected in a network. It 
is described in a copending U.S. patent application Ser. No. 
09/198,051, still pending, in the name of Guy Riddle, 
entitled "Method for Automatically Determining a Traffic 
Policy in a Packet Communications Network", based on 
U.S. Provisional Patent Application Ser. No. 60/066,864. 
The subject of the present invention is also a tool designed 
to assist the network manager. 

While these efforts teach methods for solving problems 
associated with scheduling transmissions, automatically 
determining data flow rate on a TCP connection, allocating 
bandwidth based upon a classification of network traffic and 
automatically determining a policy, respectively, there is no 
teaching in the prior art of methods for automatically 
classifying packet traffic based upon information gathered 
from a multiple layers in a multi-layer protocol network. 

Bandwidth has become the expensive commodity of the 
'90s, as traffic expands faster than resources, the need to 
"prioritize" a scarce resource, becomes ever more critical. 
One way to solve this is by applying "policies" to control 
traffic classified as to type of service required in order to 
more efficiently match resources with traffic. 

Traffic may be classified by type, e.g. E-mail, web surfing, 
file transfer, at various levels. For example, to classify by 
network paradigm, examining messages for an IEEE source/ 
destination service access point (SAP) or a sub -layer access 
protocol (SNAP) yields a very broad indicator, i.e., SNA or 
IP. More specific types exist, such as whether an IP protocol 
field in an IP header indicates TCP or UDP. Well known 
connection ports provide indications at the application layer, 
i.e., SMTP or HTTP. 

Classification is not new. Firewall products like "Check- 
Point FireWall-1," a product of Checkpoint Software 
Technologies, Inc., a company with headquarters in Red- 
wood City, Calif., have rules for matching traffic. Bandwidth 
managers such as "Aponet," a product of Aponet, Inc., a 
company with headquarters in San Jose, Calif., classify by 
destination. The PacketShaper, a product of Packeteer, Inc., 
a company with headquarters in Cupertino, Calif., allows a 
user to manually enter rules to match various traffic types for 
statistical tracking, i.e., counting by transaction, byte count, 
rates, etc. However, manual rule entry requires a level of 65 
expertise that limits the appeal for such a system to network 
savvy customers. What is really needed is a method for 
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to provide a convenience to the user, by clarifying process- 
ing and enables the user to obtain group counts of all parts 
comprising a service. 

The invention will be belter understood upon reference to 
the following detailed description in connection with the 
accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1A depicts a representative client server relationship 
in accordance with a particular embodiment of the inven- 
tion; 

FIG. IB depicts a functional perspective of the represen- 
tative client server relationship in accordance with a par- 
ticular embodiment of the invention; 

FIG. 1C depicts a representative internetworking envi- 
ronment in accordance with a particular embodiment of the 
invention; 

FIG. ID depicts a relationship diagram of the layers of the 
TCP/IP protocol suite; 

FIGS. 2A-2B depict representative divisions of band- 
width; 

FIG. 3 depicts a component diagram of processes and data 
structures in accordance with a particular embodiment of the 
invention; and 

FIGS. 4A-4B depict flowcharts of process steps in auto- 
matically classifying traffic in accordance with a particular 
embodiment of the invention. 

DESCRIPTION OF SPECIFIC EMBODIMENTS 
1.0 Introduction 

The present invention provides techniques to automati- 
cally classify a plurality of heterogeneous packets in a 
packet telecommunications system for management of net- 
work bandwidth in systems such as a private area network, 
a wide area network or an internetwork. Systems according 
to the present invention enable network managers to: auto- 
matically define traffic classes, for which policies may then 
be created for specifying service levels for the traffic classes 
and isolating bandwidth resources associated with certain 
traffic classes. Inbound as well as outbound traffic may be 
managed. Table 1 provides a definitional list of terminology 
used herein. 
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TABLE 1 



LIST OF DEFINITIONAL TERMS 



ADMISSIONS A policy invoked whenever a system according to the 
CONTROL invention detects that a guaranteed information rate 

cannot be maintained. An admissions control policy is 
analogous to a busy signal in the telephone world. 
CLASS SEARCH A search method based upon traversal of a N-ary tree 
ORDER data structure containing classes. 

COMMITTED A rate of data flow allocated to reserved service traffic 
INFORMATION for rate based bandwidth allocation for a committed 
bandwidth. Also called a guaranteed information rate 
(GIR). 

A class of traffic provided by the user which 
supersedes an automatically determined classification 
order. 

A rate of data flow allocated to reserved service traffic 
for rate based bandwidth allocation for uncommitted 
bandwidth resources. 

A flow is a single instance of a traffic class. For 
example, all packets in a TCP connection belong to the 
same flow. As do all packets in a UDP session. 
A rate of data flow allocated to reserved service traffic 
for rate based bandwidth allocation for a committed 
bandwidth. Also called a committed information rate 
(CIR). 

On the system side of an access link. Outside clients 
and servers are on the other side of the access link. 
Isolation is the degree that bandwidth resources are 
allocable to traffic classes. 

On the opposite side of an access link as viewed from 
the perspective of the system on which the software 
resides. 

Partition is an arbitrary unit of network resources. 
A rule for the assignment of a service level to a flow. 
A method for assigning policies to flows for which no 
policy exists in a hierarchical arrangement of policies. 
For example, if a flow is determined to be comprised 
of FTP packets for Host A, and no corresponding 
policy exists, a policy associated with a parent node, 
such as an FTP policy, may be located and used. 
POLICY BASED An adjustment of a requested data rate for a particular 
flow based upon the policy associated with the flow 
and information about the flow's potential rate. 
Assignment of a data rate based upon delected speed. 
A service paradigm having a combination of 
characteristics defined by a network manager to handle 
a particular class of traffic. Service levels may be 
designated as either reserved or unreserved. 
All traffic between a client and a server end points. A 
single instance of a traffic class is called a flow. 
Traffic classes have properties or class attributes such 
as, directionality, which is the property of traffic to be 
flowing inbound or outbound; 

Unreserved service is a service level defined in terms 
of priority ih which no reservation of bandwidth is 
made. 

A Universal Resource Identifier is the name of the 
location field in a web reference address. It is also 
called a URL or Universal Resource Locator 
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The hardware configurations are in general standard and 
will be described only briefly. In accordance with known 
practice, server 20 includes one or more processors 30 which 
communicate with a number of peripheral devices via a bus 
s subsystem 32. These peripheral devices typically include a 
storage subsystem 35, comprised of a memory subsystem 
35a and a file storage subsystem 35b holding computer 
programs (e.g., code or instructions) and data, a set of user 
interface input and output devices 37, and an interface to 
10 outside networks, which may employ Ethernet, Token Ring, 
ATM, IEEE 802.3, ITU X.25, Serial Link Internet Protocol 
(SLIP) or the public switched telephone network. This 
interface is shown schematically as a "Network Interface" 
block 40. It is coupled to corresponding interface devices in 
client computers via a network connection 45. 

Client 25 has the same general configuration, although 
typically with less storage and processing capability. Thus, 
while the client computer could be a terminal or a low-end 
personal computer, the server computer is generally a high- 
end workstation or mainframe, such as a SUN SPARC 
server. Corresponding elements and subsystems in the client 
computer are shown with corresponding, but primed, refer- 
ence numerals. 

Bus subsystem 32 is shown schematically as a single bus, 
but a typical system has a number of buses such as a local 
bus and one or more expansion buses (e.g., ADB, SCSI, ISA, 
EISA, MCA, NuBus, or PCI), as well as serial and parallel 
ports. Network connections are usually established through 
a device such as a network adapter on one of these expansion 
buses or a modem on a serial port. The client computer may 
be a desktop system or a portable system. 

The user interacts with the system using interface devices 
37' (or devices 37 in a standalone system). For example, 
client queries are entered via a keyboard, communicated to 
client processor 30', and thence to modem or network 
interface 40' over bus subsystem 32'. The query is theo 
communicated to server 20 via network connection 45. 
Similarly, results of the query are communicated from the 
server to the client via network connection 45 for output on 
one of devices 37' (say a display or a printer), or may be 
stored on storage subsystem 35'. 

FIG. IB is a functional diagram of a computer system 
such as that of FIG. 1A. FIG. IB depicts a server 20, and a 
representative client 25 of a plurality of clients which may 
interact with the server 20 via the Internet 45 or any other 
communications method. Blocks to the right of the server 
arc indicative of the processing steps and functions which 
occur in the server's program and data storage indicated by 
blocks 35a and 35b in FIG. 1A. A TCP/IP "stack" 44 works 
in conjunction with Operating System 42 to communicate 
with processes over a network or serial connection attaching 
Server 20 to Internet 45. Web server software 46 executes 
concurrently and cooperatively with other processes in 
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1.1 Hardware Overview 

The method for automatically classifying heterogeneous 

packets in a packet telecommunications environment of the , , . , ,. A _ n . „ ., .. t 

;„,,»!,t;«„ : u„ • ( l„ n ■, , 55 server 20 to make data objects 50 and 51 available to 

present invention is implemented in the L programming 55 ... » „ ~ c 



language and is operational on a computer system such as 
shown in FIG. 1A. This invention may be implemented in a 
client-server environment, but a client -server environment is 
not essential. This figure shows a conventional client-server 
computer system which includes a server 20 and numerous 
clients, one of which is shown as client 25. The use of the 
term "server" is used in the context of the invention, wherein 
the server receives queries from (typically remotej) clients, 
does substantially all the processing necessary to formulate 
responses to the queries, and provides these responses to the 



requesting clients. A Common Gateway Interface (CGI) 
script 55 enables information from user clients to be acted 
upon by web server 46, or other processes within server 20. 
Responses to client queries may be returned to the clients in 
the form of a Hypertext Markup Language (HTML) docu- 
ment outputs which are then communicated via Internet 45 
back to the user. 

Client 25 in FIG. IB possesses software implementing 
functional processes operatively disposed in its program and 



clients. However, server 20 may itself act in the capacity of 65 data storage as indicated by block 35a' in FIG. LA. TCP/IP 
a client when it accesses remote databases located at another stack 44', works in conjunction with Operating System 42* to 
node acting as a database server. communicate with processes over a network or serial con- 
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[lection attaching Client 25 to Internet 45. Software imple- 
menting the function of a web browser 46'executes concur- 
rently and cooperatively with other processes in client 25 to 
make requests of server 20 for data objects 50 and 51. The 
user of the client may interact via the web browser 46' to 
make such queries of the server 20 via Internet 45 and to 
view responses from the server 20 via Internet 45 on the web 
browser 46'. 
Network Overview 

FIG. 1C is illustrative of the internetworking of a plurality 
of clients such as client 25 of FIGS. 1A and IB and a 
plurality of servers such as server 20 of FIGS. 1A and IB as 
described herein above. In FIG. 1C, network 60 is an 
example of a Token Ring or frame oriented network. Net- 
work 60 links host 61, such as an IBM RS6000 RISC 
workstation, which may be running the AIX operating 
system, to host 62, which is a personal computer, which may 
be running Windows 95, IBM OS/2 or a DOS operating 
system, and host 63, which may be an IBM AS/400 
computer, which may be running the OS/400 operating 
system. Network 60 is interne tworked to network 70 via a 
system gateway which is depicted here as router 75, but 
which may also be a gateway having a firewall or a network 
bridge. Network 70 is an example of an Ethernet network 
that interconnects host 71, which is a SPARC workstation, 
which may be running SUNOS operating system with host 
72, which may be a Digital Equipment VAX6000 computer 
which may be running the VMS operating system. 

Router 75 is a network access point (NAP) of network 70 
and network 60. Router 75 employs a Token Ring adapter 
and Ethernet adapter. This enables router 75 to interface with 
the two heterogeneous networks. Router 75 is also aware of 
the Inter-network Protocols, such as ICMP and RIP, which 
are described herein below. 

FIG. ID is illustrative of the constituents of the Trans- 
mission Control Protocol/Internet Protocol (TCP/IP) proto- 
col suite. The base layer of the TCP/IP protocol suite is the 
physical layer 80, which defines the mechanical, electrical, 
functional and procedural standards for the physical trans- 
mission of data over communications media, such as, for 
example, the network connection 45 of FIG. 1A. The 
physical layer may comprise electrical, mechanical or func- 
tional standards such as whether a network is packet switch- 
ing or frame-switching; or whether a network is based on a 
Carrier Sense Multiple Access/Collision Detection (CSMA/ 
CD) or a frame relay paradigm. 

Overlying the physical layer is the data link layer 82. The 
data link layer provides the function and protocols to trans- 
fer data between network resources and to detect errors that 
may occur at the physical layer. Operating modes at the 
datalink layer comprise such standardized network topolo- 
gies as IEEE 802.3 Ethernet, IEEE 802.5 Token Ring, ITU 
X.25, or serial (SLIP) protocols. 

Network layer protocols 84 overlay the datalink layer and 
provide the means for establishing connections between 
networks. The standards of network layer protocols provide 
operational control procedures for internetworking commu- 
nications and routing information through multiple heterog- 
enous networks. Examples of network layer protocols are 
the Internet Protocol (IP) and the Internet Control Message 
Protocol (ICMP). The Address Resolution Protocol (ARP) is 
used to correlate an Internet address and a Media Access. 
Address (MAC) for a particular host. The Routing Informa- 
tion Protocol (RIP) is a dynamic routing protocol for passing 
routing information between hosts on networks. The Internet 
Control Message Protocol (ICMP) is an internal protocol for 
passing control messages between hosts on various net- 



[2,000 Bl 

8 

works. ICMP messages provide feedback about events in the 
network environment or can help determine if a path exists 
to a particular host in the network environment. The latter is 
called a "Ping". The Internet Protocol (IP) provides the basic 

5 mechanism for routing packets of information in the Inter- 
net. IP is a non-reliable communication protocol. It provides 
a "best efforts" delivery service and does not commit net- 
work resources to a particular transaction, nor does it 
perform retransmissions or give acknowledgments. 

The transport layer protocols 86 provide end-to-end trans- 
port services across multiple heterogenous networks. The 
User Datagram Protocol (UDP) provides a connectionless, 
datagram oriented service which provides a non-reliable 

is delivery mechanism for streams of information. The Trans- 
mission Control Protocol (TCP) provides a reliable session- 
based service for delivery of sequenced packets of informa- 
tion across the Internet. TCP provides a connection oriented 
reliable mechanism for information delivery. 

20 The session, or application layer 88 provides a list of 
network applications and utilities, a few of which arc 
illustrated here. For example, File Transfer Protocol (FTP) is 
a standard TCP/IP protocol for transferring files from one 
machine to another. FTP clients establish sessions through 
TCP connections with FTP servers in order to obtain files. 
Telnet is a standard TCP/IP protocol for remote terminal 
connection. A Telnet client acts as a terminal emulator and 
establishes a connection using TCP as the transport mecha- 

30 nism with a Telnet server. The Simple Network Management 
Protocol (SNMP) is a standard for managing TCP/IP net- 
works. SNMP tasks, called "agents", monitor network status 
parameters and transmit these status parameters to SNMP 
tasks called "managers." Managers track the status of asso- 

35 ciated networks. A Remote Procedure Call (RPC) is a 
programming interface which enables programs to invoke 
remote functions on server machines. The Hypertext Trans- 
fer Protocol (HTTP) facilitates the transfer of data objects 
across networks via a system of uniform resource indicators 

40 (URI). 

The Hypertext Transfer Protocol is a simple protocol built 
on top of Transmission Control Protocol (TCP). It is the 
mechanism which underlies the function of the World Wide 
Web. The HTTP provides a method for users to obtain data 
45 objects from various hosts acting as servers on the Internet. 

2.0 Traffic Class 

A traffic class is broadly defined as traffic between one or 
more clieots and one or more servers. A single instance of a 

50 traffic class is called a flow. Traffic classes have the property, 
or class attribute, of being directional, i.e. all traffic flowing 
inbound will belong to different traffic classes and be man- 
aged separately from traffic flowing outbound. The direc- 
tional property enables asymmetric classification and con- 

55 trol of traffic, i.e., inbound and outbound Bows belong to 
different classes which may be managed independent of one 
another. 

Traffic classes may be defined at any level of the IP 
protocol as well as for other non-IP protocols. For example, 
60 at the IP level, traffic may be defined as only those flows 
between a specificcd set of inside and outside IP addresses 
or domain names. An example of such a low level traffic 
class definition would be all traffic between my network and 
other corporate offices throughout the Internet. At the appli- 
es cation level, traffic classes may be defined for specific URls 
within a web server. Traffic classes may be defined having 
"Web aware" class attributes. For example, a traffic class 
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could be created such as all URIs matching "Vhtmi" for all 
servers, or all URI patterns matching "*.gif" for server X, or 
for access to server Y with URI pattern "/sales/*" from client 
Z, wherein '*' is a wildcard character, i.e., a character which 
matches all other character combinations. Traffic class 
attributes left unspecified will simply match any value for 
that attribute. For example, a traffic class that accesses data 
objects within a certain directory path of a web server is 
specified by a URI pattern of the directory path to be 
managed, e.g. "/sales/*" . 
2.1 Classifying Traflic 

The present invention provides a method for classifying 
traffic according to a definable set of classification attributes 
selectable by the manager, including selecting a subset of 
traffic of interest to be classified. The invention provides the 
ability to classify and search traffic based upon multiple 
orthogonal classification attributes. 

Traffic class membership may be hierarchical. Thus, a 
flow may be classified by a series of steps through a traffic 
class tree, with the last step (i.e., at the leaves on the 
classification tree) mapping the flow to a policy. The policy 
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Components of a Traffic Class Specifier 



Inside 

(Client or Server) 



Global 



Outside (Server or Client) 
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20 



IP Address/Domain 


TCP or UDP Service 


IP Address/Domain Name 


Name 






Port Number 


e.g., WWW, 


Port Number 


MACAddiess 


FTP, RealAudio, etc. 


MACAddiess 




URI pattern for 






Web Service, 






MIME type for 






Web Service 






IPX Service 






SNA Service 






LAT Service 






IP precedence 





FIGS. 2A and 2B depict representative allocations of 
bandwidth made by a hypothetical network manager as an 
example. In FIG. 2A, the network manager has decided to 
divide her network resources first by allocating bandwidth 
between Departments A and B. FIG. 2 A shows the resulting 



classification tree 201, in which Department A bandwidth 
is a rule of assignment fo7 flows. Web traffic 'may also be « resources 202 and Department B bandwidth resources 204 



classified by HTTP header types such as Content-Type 
(MIME type) or User-Agent. 

A classification tree is a data structure representing the 
hierarchical aspect of traffic class relationships. Each node 



each have their own nodes representing a specific traflic 
class for that department. Each traffic class may have a 
policy attribute associated with it. For example, in FIG. 2A, 
the Department A resources node 202 has the policy attribute 
Inside Host Subnet A associated with it. Next, the network 



of the classification tree represents a class, and has a traffic 30 has t ch ° xu \° 1^ b an d width resources of 

specification, i.e., a set of attributes or characteristics 
describing the traffic associated with it. Leaf nodes of the 
classification tree may contain policies. According to a 



Department A among two applications. She allocates an FTP 
traffic class 206 and a World Wide Web server traffic class 
208. Each of these nodes may have a separate policy 
attribute associated with them. For example, in FIG. 2A, the 



particular embodiment, the classification process checks at 35 FTP no de 206 for has an attribute Outside port 20 associated 
each level if the flow being classified matches the attributes 
of a given traffic class. If it does, processing continues down 
to the links associated with that node in the tree. If it docs 
not, the class at the level that matches determines the policy 
for the flow being classified. If no policy specific match is 40 
found, the flow is assigned the default policy. 

In a preferred embodiment, the classification tree is an 
N-ary tree with its nodes ordered by specificity. For 



with it. Similarly, the network manager has chosen to divide 
network bandwidth resources of Department B into an FTP 
server traffic class 210 and a World Wide Web server traflic 
class 212. Each may have their own respective policies. 

FIG. 2B shows a second example 203, wherein the 
network manager has chosen to first divide network band- 
width resource between web traffic and TCP traflic. She 
creates three traffic nodes, a web traffic node 220, a TCP 
traffic node 224 and a default node 225. Nexl, she divides the 



example, in classifying a particular flow in a classification 45 werJ traffic among two organizational departments by cre- 

tree ordered first by organizational departments, the ating a Department A node 226, and a Department B node 

attributes of the flow are compared with the traffic specifi- 228. Each may have its own associated policy. Similarly, she 

cation in each successive department node and if no match divides TCP network bandwidth into separate traffic classes 

is found, then processing proceeds to the next subsequent by creating a Department A node 230 and a Department B 

department node. If no match is found, then the final so node 232. Each represents a separate traffic class which may 



compare is a default "match all" category. If, however, a 
match is found, then classification moves to Ihe children of 
this department node. The child nodes may be ordered by an 
orthogonal paradigm such as, for example, "service type." 



have its own policy. 

All traffic which does not match any user specified traffic 
class falls into an automatically created default traffic class 
which has a default policy. In FIG. 2 A, the default category 



Matching proceeds according to the order of specificity in 55 * depicted by a default ^node 205, .and inFIG. 2B, the default 
the child nodes. Processing proceeds in this manner, tra- 
versing downward and from left to right in FIGS. 2 A and 2B, 
which describe a classification tree, searching the plurality 
of orthogonal paradigms. Key to implementing this a bier- 



category is depicted by a default node 225. 

3.0 Automatically Classifying Traffic 

3.1 Automatic Traffic Classification 

Network traffic is automatically classified under existing 



u .u . .if j l . . . , ,60 classes, beginning with the broadest classes, an inbound 

arcny ts that the nodes arc arranged in decreasing order of . «= i j ,i_ j.tc i • . n 
' & & traffic class and an outbound traffic class, in protocol layer 



specificity. This permits search to find the most specific class 
for the traffic before more general. 
Table 2 depicts components from which Traffic classes 



independent categories. For example, a particular instance of 
traffic may be classified according to its transport layer 
characteristics, e.g., Internet Protocol port number, as well 
may be built. Note that the orientation of the server (inside 65 as ils application layer information, e.g., SMTP. Character- 
or outside) is specified. And as noted above, any traffic class istics such as MIME types may also be automatically 
component may be unspecified, i.e. set to match any value. identified. Standard protocols, such as, IPX, SNA, and 
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services, such as, SMTP and FTP are recognized for auto- 3.1.5 Identity of Traffic Based Upon Resource Creator's 

matic classification. Classification is performed to the most Class 

specific level determinable. For example, in select A traffic class may be inferred from determining the 
embodiments, non-IP traffic, such as SNA, may be classified identity of the creator of a resource used by the traffic class, 
only by protocol, whereas Internet Protocol traffic may be 5 For example, the identity of traffic using a certain connection 
classified to the /etc/services level. Classification beyond a can be determined by finding the identity of the creator of 
terminal classification level is detected and prevented. For me connection. This method is used to detect Real Time 
example, in a select embodiment, a class matching "ipx" or p rotocol (RTP) for point-to-point telephony, RTP for broad- 
mite will not be further automatically classified. cast strcamingf CCITT/ITU H320-telephony over ISDN, 

3.1.1 service Aggregates io H323-interaet telephony over the internet (bidirectional) and 
A service aggregate is provided for certain applications n^on i / • . i r 

that use more than one connection in a particuhJ conversa- * TS / rcal , imc breaming protocol for movies 

tion between a client and a server. For example, an FTP (unidirectional), 

client in conversation with an FTP server employs a com- 3.1.6 Dynamic Ports 

mand channel and a transfer channel, which arc distinct TCP Applications having a well known port for a server may 
sessions on two different ports. In cases where two or three 15 makc ^ of dynamic ports. Some applications will send 
TCP or UDP sessions exist for each conversation between initial messages across a first connection, then negotiate a 
one client and one server, it is useful to provide a common dynamic port for further conversation. During the existence 
traffic class i.e., the service aggregate, containing the sepa- of a connection, both endpoints are known. A check is made 
rate conversations. In practice, these types of conversations for two simultaneous connections to the same, non well- 
are between the same two hosts, but use different ports. 20 known port, at same time from different locations. This 
According to the invention, a class is created with a plurality condition is indicative of a connection port for some appli- 
of traffic specifications, each matching various component cation. Varieties of the dynamic port exist in applications, 
conversations. Certain dynamic ports are incorporated into a client. Others 

3. 1.2 Subclassification Under Specified Criterion are fixed but not registered. Still others are negotiated during 
Subclassification of traffic into a tree is performed by 25 a protocol exchange, as for example in passive FTP. 

matching the hosts and then searching for particular ser- 3.2 Automatic Traffic Classification Processing 

vices. Traffic specifications are aggregate kinds of traffic for FIG. 3 depicts components of a system for automatically 

a traffic class, e.g., different components of FTP may reside classifying traffic according to the invention. A traffic tree 

under class FTP. Subclassification is performed by first 302 in which new traffic will be classified under a particular 

locating a class that matches, and then performing finer 30 . member class node. A traffic classifier 304 detects services 

grade matchings. Processing commences with a decision on for incoming traffic. Alternatively, the classifier may start 

what traffic is to be subclassified. A marker is placed in the with a service and determine the hosts using it. A knowledge 

match_all default node so that when match processing base 306 contains heuristics for determining traffic classes, 

reaches the marker, the autoclassification processing The knowledge base may be embodied in a file or a 

depicted in flowchart 403, determines that it has not found 35 relational database. In a particular embodiment, the knowl- 

an existing class for the traffic being classified. edge is contained within a data structure resident in memory. 

3. 1.3 Default Suggested Policies A plurality of saved lists 308 stores classified traffic pending 
A default policy may be suggested or, in select incorporation into traffic tree 302. In select embodiments, 

embodiments, automatically applied, to a traffic class which entries for each instance of traffic may be kept. In alternate 

has been automatically classified. Applying suggested or 40 embodiments, a copy of an entry and a count of duplicate 

default policies for a new class at a user's option is described copies for the entry is maintained. 

in a copending, commonly owned, U.S. patent application FIG. 4A depicts a flowchart 401 of processing steps for 

Ser. No. 09/198,051, still pending, entitled, "Method for automatically classifying traffic. In a step 402, a flow speci- 

Automatically Determining a Traffic Policy in a Packet fication is parsed from the flow being classified. Then in a 

Communications Network", which is incorporated herein by 45 step 404, the flow specification parsed from the flow in step 

reference in its entirety for all purposes. 402 is compared with the traffic specifications in each node 

3.1.4 Analysis of Data in Determining Traffic Class of the classification tree. Rules are checked starting from 
In a preferable embodiment, classification can extend to most specific to least specific. In a decisional step 406, a 

examination of the data contained in a flow's packets. determination is made if traffic matches one of the classes 
Certain traffic may be distinguished by a signature even if it 50 being classified. If this is so, then in a step 408, an entry is 
originates with a server run on a non-standard port, for made in a list of identifying characteristics, such as protocol 
example, an HTTP conversation on port 8080 would not be type (SAP), IP protocol number, server port, traffic type if 
otherwise determinable as HTTP from the port number. known, MIME type, a time of occurrence of the traffic. In an 
Further analysis of the data is conducted in order to deter- optional step 410, duplicate instances having the same 
mine classification in instances where: 1) FTP commands 55 identifying characteristics are suppressed, in favor of keep- 
are used to define server ports, 2) HTTP protocol is used for ing a count of the duplicates and a most recent time traffic 
non-web purposes. The data is examined for indication of with these identifying characteristics was encountered. In an 
push traffic, such as pointcast, which uses HTTP as a optional step 412, a byte count of traffic of this type has been 
transport mechanism. These uses may be isolated and clas- detected is included. Otherwise, the automatic classification 
sified into a separate class. Marimba and pointcast can be 60 has failed to determine a class and processing returns, 
distinguished by looking into the data for a signature content In a preferable embodiment, processing according to 
header in the gel request. PointCast has URLs that begin with flowchart 401 may execute on multiple instances of saved 
"/FlDO-l/." Other applications in which protocol can be list 308. 

inferred from data include Telnet traffic. Both tn3270 and 3.2.1 Displaying Results to a User 

tn3270E (emulation) may be detected by looking into data 65 In an optional step 413 (not show), after the processing of 

and given a different class. Telnet traffic has option ncgo- flowchart 401 completes or at periodic intervals or on 

tiations which may indicate an appropriate class. demand, a list of traffic classes produced in steps 402 
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through 412 are displayed to a network manager. The list In a related embodiment in place of step 425, a display of 

may be sorted by any well-known criteria such as: 1) most traffic classes, sorted by most recently used, most hits, 

"hits" during a recent interval, 2) most recently-seen (most number of bytes received during any interval, which is 

recent time first), 3) most data transferred (bytes/second) determined by a plurality of time stamps, is available on 

during some interval, or a moving average. The user may 5 demand to a network manager. The network manager indi- 

choose an interval length or display cutoff point (how many catcs tf, al mc traffic is to be added to the tree. 

items, how recent, at least B bytes per second, or other In a particular embodiment a threshold is employed to 

thresholds). The Network manager may then take some deterinmc traffic f or wn j c h a scparatc ^ sho uld be added. 

action (e.g. pushing a button) to select the traffic types she A minimum threshold indicates whether a particular 

wishes to add to the classmcation tree The display ^can be h&& bccQ ^ k timcs {q ^ ^ s sccQnds ]f 

hierarchical, as depicted in lines (3) below: FTP (3) ; «= ■ n i • mjtii •* • jj j , . <c 1 

, v J VJ traffic is well known, i.e., SMTP, it is added to a traffic class 

FTPdata immediately, i.e., threshold is equal to one, otherwise, the 
" threshold is set equal to an arbitrary value, for example, 
to nostl eleven uses with not more than one minute between any two 
p ppp 15 uses. A new class for traffic is given a generic name, e.g., 
pr T p_ cmt j Port99 traffic. Entries for traffic over a certain maximum 
PPP_^ ata threshold, for example one minute old, is discarded. 
WYIV * n a rc ^ atc( i embodiment, another method of identifying 
images m individual traffic class is to detect simultaneous conncc- 
j a va 20 lions to the same host port from different clients. This 
text provides an indication that the port is a well-known con- 
port 9999 nection port. 
(3) Traffic classes are created for any combination of the 
wherein the "port 9999" entry is an inference correspond- above mentioned categories. A flag is added to all traffic 
ing to an application checking for repeated or simul- 25 classes so created in order to indicate that it is the product 
taneous connections made to a specific port. of the auto classifier. 
In a related embodiment, a threshold for display or class 3.2 Command Language Interface: 
creation of well-known traffic types is provided. In a particular embodiment, function of the classifier 304 
3.2.2 Interval Based Incorporation is controlled by a command language interface. Table 3 
In an alternative embodiment, at select intervals of time, 30 depicts a plurality of command language interface com- 
non matching traffic is analyzed, and cither 1) recognized mands. 
and add to the tree, or 2) for repeated attempts to request a 

server connection port that is not known, upon exceeding a TABLE 3 

certain threshold, a class for the port's traffic is created and 

added tO the Classification tree. ^ setup autoclassify {on|off} To activate autoclassifkation for various 

FIG. 4B depicts a flowchart 403 of the processing steps classes to detect well-known protocols and 

for integrating traffic classes into a classification tree in an , services: 

alternative embodiment. Processing steps of flowchart 403 ^T,^^ S J7 T T *^T^1 ^ 

, . , JP - , (lnsidekjutside both) To detect services with the host on the 

periodically at a defined interval of seconds, having a value inside, the outside, or both directions, 

of 30 in the preferable embodiment, incorporate newly class auto <tclaw> off To turn off use 

classified traffic into (he classification tree. In a step 420, an 40 

instance of saved traffic is retrieved from the saved traffic list „ , , . 

308. Next in a decisional step 422, the instance of saved ^ ncw classcs havc namcs in thc format of hncs < 4 ) 

traffic is examined to determine whether it is well-known below: 

(e.g. registered SAP, protocol type, assigned port number) <direclion>__<service>_<parent>or <direction>_port 

and a name representing its type exists. If this is so then 45 _<number>_<parent>or <direction>_<service>__ 

processing continues with a test of whether the saved traffic <portnum>_<parent> 

belongs to a service aggregate in step 426. Otherwise, in a (4) 

step 423 the instance of saved traffic is examined to deter- where <direction>is either "inside" or "outside" for TCP/ 

mine whether it appears to be a server connection port of an UDP services or "auto" for others. 

unregistered IP port (or a port that has not been configured), so If a well-known service on a non-standard port (e.g. 

If this is not so then, processing continues with the next HTTP on 8080) is detected, a name in the last format will be 

traffic class in the saved list in step 420. In decisional step created, assuming no previous class match. 

426, the instance of saved traffic is examined to determine If a heretofore unknown server-connection port appears to 

whether it belongs to a service aggregate. For example, an be "well used" , an entry of the second type is created. The 

FTP session has one flow that is used to exchange com- 55 threshold for creation is currently 11 hits with no more than 

mands and responses and a second flow that is used to 1 minute (granularity of checking is at least 30 seconds 

transport data files. If the traffic does belong to a service between running successive autoclassification processes) 

aggregate, then in a step 428, a traffic class is created which between any two hits. For example, see lines (5) below: 

will match all components of the service aggregate. In a inbound/inside/ftp 

further step 425, a new traffic class is created to match the 60 outside htlp inbound 

instance of saved traffic. Thc class may be flat or hierarchi- auto_sna_inbound 

cal. inside_port_505_luna 

In an optional step, a suggested policy is determined for outside_pointcast_8888_inbound 

the traffic class created in step 425. Next, in a decisional step (5) 

432, a limit is checked to verify that the number of auto- 65 The "class show" command will now show an 'D* in the 

matically created classes has not exceeded a specified maxi- flags for classes currently being autoclassificd. 

mum. 3.3 Syntax of Traffic Specifications: 
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Raw specifications and traffic specifications have an 
inside service field and an outside service field. Each will 
have values of S VC„UNKNOWN (0), S VC_CLIENT (1), 
or a number greater than 1, which is the service index, an 
index into the global table gService Table. If a type of service 
is known for a connection, the service field at a particular 
side will be set to SVC_CLIENT and the service field at the 
opposite side will be the index into gServiccTable. If a type 
of service is not known for the traffic, both inside service 
field and outside service field will be S VC_UNKNOWN. A 
person of reasonable skill in the art will appreciate that other 
embodiments for the table, such as representing the infor- 
mation contained therein as text strings or by any one of a 
plurality of possible encoding schemes, are realizable with- 
out departing from the present invention. 

Therefore, a traffic specification can have "outside scr- 
vice:http" (or just "outside HTTP") which is different than 
"outside tcp:80". The first will match HTTP on any port 
while the second will match anything on port 80 (including 
pointcast and marimba). 

Specifying a traffic specification tspec "service: <agg>" 
returns traffic specifications for various traffic belonging to 
the service. Specifying "class new inbound cdls outside dls" 
is the same as "class new inbound cdls outside service:dls- 
wpn" and "class tspec add cdls outside service:dls-rpn". 
Most auto-recognized services will create a class that 
encompasses all the pieces. 

Network managers need not be aware of services which 
are known to be derivative of others, e.g., pointcast and 
marimba are special cases of HTTP and tn3270 is a special 
case of Telnet, in order to work with the system. 

4.0 Conclusion 

In conclusion, the present invenlion provides for an 
automatic determination of a policy for a packet telecom- 
munications systems wherein bandwidth is allocated to 
requesting flows according to automatically determined 
application requirements. An advantage of traffic classifica- 
tion techniques according to the present invention is that 
network managers need not know the technical aspects of 
each kind of traffic in order to configure traffic classes. A 40 
further advantage of the present invention is that traffic 
classes may include information such as a MIME type for 
web traffic . 

Other embodiments of the present invention and its indi- 
vidual components will become readily apparent to those 45 
skilled in the art from the foregoing detailed description. As 
will be realized, the invention is capable of other and 
different embodiments, and its several details are capable of 
modifications in various obvious respects, all without 
departing from the spirit and the scope of the present 50 
invention. Accordingly, the drawings and detailed descrip- 
tion are to be regarded as illustrative in nature and not as 
restrictive. It is therefore not intended that the invention be 
limited except as indicated by the appended claims. 

What is claimed is: 55 

1. A method for automatically classifying traffic in a 
packet communications network, said network having any 
number of flows, including zero, comprising the steps of: 

parsing a packet into a first flow specification, wherein 
said first flow specification contains at least one 60 
instance of any one of the following: 

a protocol family designation, 

a direction of packet flow designation, 

a protocol type designation, 

a pair of hosts, 

a pair of ports, 
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15 



20 



25 



30 



35 
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in HTTP protocol packets, a pointer to a MIME type; 
thereupon, 

matching the first flow specification of the parsing step 
to a plurality of classes represented by a plurality 
nodes of a classification tree type, each said classi- 
fication tree type node having a traffic specification; 
thereupon, 

if a matching classification tree type node was not 
found in the matching step, associating said first flow 
specification with one or more newly-created clas- 
sification tree type nodes; thereupon, 

incorporating said newly-created classification tree 
type nodes into said plurality of classification tree 
type nodes. 

2. The method of claim 1 further comprising the steps of: 
for at least a second flow having a second flow 

specification, recognizing said second flow specifica- 
tion and said first flow specification to comprise 
together a service aggregate; thereupon, 
associating said first flow specification and said second 
flow specification with a newly-created classification 
tree node, said newly-created classification tree type 
node having a first traffic specification corresponding to 
said first flow specification and a second traffic speci- 
fication corresponding to said second flow specifica- 
tion. 

3. The method of claim 1 further comprising the steps of: 
applying policies from said newly-created classification 

tree type nodes to instances of detected traffic. 

4. The method of claim 1 further comprising the steps of: 
for a subclassification under a specified criterion, if a 

matching classification tree type node was found in the 
matching step, said matching classification tree type 
node having at least one child classification tree type 
node, applying the matching, associating, and incorpo- 
rating steps to a particular child classification tree type 
node of said matching classification tree type node as a 
part of classification. 

5. The method of claim 1 wherein the parsing step further 
comprises the steps of: 

examining data contained within a plurality of component 
packets belonging to said first flow for any number of 
a plurality of indicators of any of the following: 

a protocol; 

a service; thereupon, matching said plurality of indicators 
to said classes represented by a plurality of said clas- 
sification tree type nodes. 

6. The method of claim 1 further including measuring 
traffic load and invoking said classification upon achieve- 
ment of a minimum usage threshold. 

7. The method according to claim 1 wherein said match- 
ing step is applied to hierarchically-recognized classes. 

8. A system for automatically classifying traffic in a 
packet telecommunications network, said network having 
any number of flows, including zero, comprising: 

a plurality of network links upon which said traffic is 
carried; 

a network routing means; and, 
a processor means operative to: 
parse a packet into a first flow specification, wherein 
said first flow specification contains at least one 
instance of any one of the following: 
a protocol family designation, 
a direction of packet flow designation, 
a protocol type designation, 
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a pair of hosts, 
a pair of ports, 

in HTTP protocol packets, a pointer to a MIME type; 
thereupon, 

match the first flow specification of the parsing step to 5 
a plurality of classes represented by a plurality of 
said classification tree type nodes, each said classi- 
fication tree type node having a traffic specification 
and a mask, according to the mask; thereupon, 
if a matching classification tree type node was not found 10 
in the matching step, associating said first flow speci- 
fication with one or more newly-created classification 
tree type nodes; thereupon, incorporating said newly- 
created classification tree type nodes into said plurality 
of said classification tree type nodes. 15 

9. The method of claim 8 further including measuring 
traffic load and invoking said classification upon achieve- 
ment of a minimum usage threshold. 

10. The method according to claim 8 wherein said match- 
ing step is applied to hierarchically-recognized classes. 20 

11. A method for classifying traffic in a packet telecom- 
munications network, said network having any number of 
flows, including zero, comprising the steps of: 

parsing a packet into a first flow specification, wherein 
said first flow specification contains at least one 25 
instance of any one of the following: 

a protocol family designation, 

a direction of packet flow designation, 

a protocol type designation, 
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a pair of hosts, 
a pair of ports, 

in HTTP protocol packets, a pointer to a MIME type; 
thereupon, 

matching the first flow specification of the parsing step 
to a plurality of classes represented by a plurality of 
classification tree type nodes, each said classification 
tree type node having a traffic specification; 
thereupon, 

if a matching classification tree type node was not 
found in the matching step, associating said first flow 
specification with at least one more newly-created 
node; thereupon, 

displaying to a network administrator a representation 
of traffic according to said traffic specification for use 
in manual intervention. 

12. The method according to claim 11 further including 
the step of sorting said traffic representation according to 
most recently occurring. 

13. The method according to claim 11 further including 
the step of sorting said traffic representation according to 
most data transferred for a preselected period of time. 

14. The method of claim 11 further including measuring 
traffic load and invoking said classification upon achieve- 
ment of a minimum usage threshold. 

15. The method according to claim 11 wherein said 
matching step is applied to hierarchically-recognized 
classes. 
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ABSTRACT 



A flexible, policy-based, mechanism for managing, 
monitoring, and prioritizing traffic within a network and 
allocating bandwidth to achieve true quality of service 
(QoS) is provided. According lo one aspect of the present 
invention, a method is provided for managing bandwidth 
allocation in a network that employs a non-deterministic 
access protocol, such as an Ethernet network. A packet 
forwarding device receives information indicative of a set of 
traffic groups, such as: a MAC address, or IEEE 802. lp 
priority indicator or 802. 1Q frame tag, if the QoS policy is 
based upon individual station applications; or a physical port 
if the QoS policy is based purely upon topology. The packet 
forwarding device additionally receives bandwidth param- 
eters corresponding to the traffic groups. After receiving a 
packet associated with one of the traffic groups on a first 
port, the packet forwarding device schedules the packet for 
transmission from a second port based upon bandwidth 
parameters corresponding to the traffic group with which the 
packet is associated. According to another aspect of the 
present invention, a method is provided for managing band- 
width allocation in a packet forwarding device. The packet 
forwarding device receives information indicative of a set of 
traffic groups. The packet forwarding device additionally 
receives information defining a QoS policy for the traffic 
groups. After a packet is received by the packet forwarding 
device, a traffic group with which the packet is associated is 
identified. Subsequently, rather than relying on an end-to- 
end signaling protocol for scheduling, the packet is sched- 
uled for transmission based upon the QoS policy for the 
identified traffic group. 

29 Claims, 5 Drawing Sheets 
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POLICY BASED QUALITY OF SERVICE 

This application claims the benefit of U.S. Provisional 
Application No. 60/057,371, filed Aug. 29, 1997. 

COPYRIGHT NOTICE 

Contained herein is material that is subject to copyright 
protection. The copyright owner has no objection to the 
facsimile reproduction of the patent disclosure by any per- 
son as it appears in the Patent and Trademark Office patent 
files or records, but otherwise reserves all rights to the 
copyright whatsoever. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The invention relates generally to the field of computer 
networking devices. More particularly, the invention relates 
to a flexible, policy-based mechanism for managing, 
monitoring, and prioritizing traffic within a network and 
allocating bandwidth to achieve true Quality of Service 
(QoS). 

2. Description of the Related Art 

Network traffic today is more diverse and bandwidth- 
intensive than ever before. Today's intranets are expected to 
support interactive multimedia, full-motion video, rich 
graphic images and digital photography. Expectations about 
the quality and timely presentation of information received 
from networks is higher than ever. Increased network speed 
and bandwidth alone will not satisfy the high demands of 
today's intranets. 

The Internet Engineering Task Force (IETF) is working 
on a draft standard for the Resource Reservation Protocol 
(RSVP), an Internet Protocol-(IP) based protocol that allows 
end-stations, such as desktop computers, to request and 
reserve resources within and across networks. Essentially, 
RSVP is an end-to-end protocol that defines a means of 
communicating the desired Quality of Service between 
routers. RSVP is receiver initiated. The eod-station that is 
receiving the data stream communicates its requirements to 
an adjacent router and those requirements are passed back to 
all intervening routers between the receiving end-station and 
the source of the data stream and finally to the source of the 
data stream itself. Therefore, it should be apparent that 
RSVP must be implemented across the whole network. That 
is, both end-stations (e.g., the source and destination of the 
data stream) and every router in between should be RSVP 
compliant in order to accommodate the receiving end- 
station's request. 

While RSVP allows applications to obtain some degree of 
guaranteed performance, it is a first-come, first-served 
protocol, which means if there are no other controls within 
the network, an application using RSVP may reserve and 
consume resources that could be needed or more effectively 
utilized by some other mission -critical application. A further 
limitation of this approach to resource allocation is the fact 
that end-stations and routers must be altered to be RSVP 
compliant. Finally, RSVP lacks adequate policy mechanisms 
for allowing differentiation between various traffic flows. It 
should be appreciated that without a policy system in place, 
the network manager loses control. 

Recent attempts to facilitate traffic differentiation and 
prioritization include draft standards specified by the Insti- 
tute of Electrical and Electronics Engineers (IEEE). The 
IEEE 802. 1Q draft standard provides a packet format for an 
application to specify which Virtual Local Area Network 
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(VLAN) a packet belongs to and the priority of the packet. 
The IEEE 802. lp committee provides a guideline to classify 
traffic based on a priority indicator in an 802.1Q frame tag. 
This allows VLANs to be grouped into eight different traffic 
5 classes or priorities. The IEEE 802. lp committee does not, 
however, define the mechanism to service these traffic 
classes. 

What is needed is a way to provide true Quality of Service 
("QoS") in a network employing a non-deterministic access 

10 protocol, such as an Ethernet network, that not only has the 
ability to prioritize and service different traffic classes, but 
additionally provides bandwidth management and guaran- 
tees a quantifiable measure of service for packets associated 
with a particular traffic class. More specifically, with respect 

15 to bandwidth management, it is desirable to employ a 
weighted fair queuing delivery schedule which shares avail- 
able bandwidth so that high priority traffic is usually sent 
first, but low priority traffic is still guaranteed an acceptable 
minimum bandwidth allocation. Also, it is desirable to 

20 centralize the control over bandwidth allocation and traffic 
priority to allow for QoS without having to upgrade or alter 
end-stations and existing routers as is typically required by 
end-to-end protocol solutions. Further, it would be advan- 
tageous to put the control in the hands of network managers 

25 by performing bandwidth allocation and traffic prioritization 
based upon a set of manager-defined administrative policies. 
Finally, since there are many levels of control a network 
manager may elect to administer, it is desirable to provide a 
variety of scheduling mechanisms based upon a core set of 

30 QoS profile attributes. 

BRIEF SUMMARY OF THE INVENTION 

A flexible, policy-based, mechanism for managing, 

3S monitoring, and prioritizing traffic within a network and 
allocating bandwidth to achieve true Quality of Service 
(QoS) is described. According to one aspect of the present 
invention, a method is provided for managing bandwidth 
allocation in a network that employs a non-deterministic 

40 access protocol. A packet forwarding device receives infor- 
mation indicative of a set of traffic groups. The packet 
forwarding device additionally receives parameters, such as 
bandwidth and priority parameters, corresponding to the 
traffic groups. After receiving a packet associated with one 

45 of the traffic groups on a first port, the packet forwarding 
device schedules the packet for transmission from a second 
port based upon parameters corresponding to the traffic 
group with which the packet is associated. Advantageously, 
in this manner, a weighted fair queuing schedule that shares 

5Q bandwidth according to some set of rules may be achieved. 
According to another aspect of the present invention, a 
method is provided for managing bandwidth allocation and 
traffic prioritization in a packet forwarding device. The 
packet forwarding device receives information indicative of 

55 a set of traffic groups. The packet forwarding device addi- 
tionally receives information defining a Quality of Service 
(QoS) policy for the traffic groups. After a packet is received 
by the packet forwarding device, a traffic group with which 
the packet is associated is identified. Subsequently, rather 

50 than relying on an end-to-end signaling protocol for 
scheduling, the packet is scheduled for transmission based 
upon the QoS policy for the identified traffic group. 
Therefore, bandwidth allocation and traffic prioritization are 
based upon a set of administrative policies over which the 

65 network manager retains control. 

According to yet another aspect of the present invention, 
a number of QoS queues are provided at each port of the 
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packet forwarding device. A current bandwidth metric is in machine-executable instructions, which may be used to 

determined for each of the QoS queues for a particular port. cause a general-purpose or special-purpose processor pro- 

The QoS queues are divided into two groups based upon grammcd with the instructions to perform the steps, 

their respective bandwidth metrics and their respective mini- Alternatively, the steps may be performed by a combination 

mum bandwidth requirements. Subsequently, the groups are 5 of hardware and software. While, embodiments of the 

used as a Erst level arbitration mechanism to select a QoS P resent invention will be described with reference to a high 

queue that will source the next packet. *P ec ? Etheraet the ^ ethod and apparatus described 

„ , P , . .„ , , herein are equally applicable to other types of network 

Other features of the present invention will be apparent deviccs Qr kct forwardin dcviccs , 

from the accompanying drawings and from the detailed ^ E^pi^y Switcn Architecture 

description which follows. 10 ^ ove rview of the architecture of a switch 100 in which 

BRIEF DESCRIPTION OF THE SEVERAL 

ac tijc hDAwiurc mented is illustrated by FIG. 1A. The central memory 

VIEWS OF IHb DRAWINGS architecture depicted includes multiple ports 105 and 110 

The present invention is illustrated by way of example, each coupled via a channel to a filtering/forwarding engine 

and not by way of limitation, in the figures of the accom- 15 115 A* 50 coupled to the filtering/forwarding engine 115 is 

panying drawings and in which like reference numerals refer a forwarding database 120, a packet Random Access 

to similar elements and in which: Memorv US > and a Central Processing Unit (CPU) 

FIG. 1 A is a simplified block diagram of an exemplary According to one embodiment, each channel is capable of 

switch architecture in which one embodiment of the present 20 supporting a data transfer ratc of ODC gi gabil pcr sccond in 

invention may be implemented. the transniit direction and one gigabit per second in the 

FIG. IB is a logical view of the interaction between receive direction, thereby providing 2 Gb/s full-duplex capa- 

switch processing blocks according to one embodiment of bility per channel. Additionally, the channels may be con- 

the present invention. figured to support one Gigabit Ethernet network connection 

FIG. 2 is a flow diagram illustrating high level bandwidth 25 or eight Fast Ethernet network connections, 

management and traffic prioritization processing according The filtering/forwarding engine 115 includes an address 

to one embodiment of the present invention. filter (not shown), a switch matrix (not shown), and a buffer 

FIG. 3 is a flow diagram illustrating periodic evaluation of manager (not shown). The address filter may provide 

QoS categories according to one embodiment of the present bridging, routing, Virtual Local Area Network (VLAN) 

invention 30 ta SS m g functions, and traffic classification. The switch 

FIG. 4 is a flow diagram illustrating next packet sched- mhtT J* J^L 6 ^ * * "T?? 

uling according to one embodiment of the present invention. P a f et RAM "5- The buffer manager controls data buffers 

and packet queue structures and controls and coordinates 

DETAILED DESCRIPTION OF THE accesses to and from the packet RAM 125. 

INVENTION 3S The forwarding database 120 may store information use- 

A „ ., , , . ful for making forwarding decisions, such as layer 2 (e.g., 

A flexible, pohcy-based, mechanism for managing Medk Access Control j } k 3 ( 

monitoring and prioritizing . traffic within a network and { } an4/or , 4 ( Tran5port layer) forwarding 

aUocatmg bandwidth to achieve true Quality of Service informa a on , among other tMngs. The switch 100 forwards a 

(QoS) is described. Quality of Service in Uns context ^ kel feceived flt ^ ■ tQ fln t b 

essentially means that there is a quantifiable measure of the formi a search on ^ f orwarding dalabase using 

service being provided. The measure of service being pro- addre&s mformatkm conned within the hea der of the 

vded may be in terms of a packet loss rate a maximum received ket , f a matching entr ^ found> a forwarding 

delay a committed minimum bandwidth, or a lim.ted maxi- decision ig constmcted tnal indicales l0 which output por , 

mum bandwidth, for example. 45 ^ received packet should be forwardedj if otherwise, 

In the present invention, a number of QoS queues may be lhe packet is forwarded to the CPU 130 for assistance in 

provided at each port of a packet forwarding device, such as constructing a forwarding decision, 

a Local Area Network (LAN) switch. Based upon a set of packet RAM 12 $ provides buffering for packets and 

QoS parameters, various types of traffic can be distinguished acts as an elasticity buffer for adapting between incoming 

and associated with particular QoS queues. For example, 50 and outgoing bandwidth differences. Packet buffering is 

packets associated with a first traffic group may be placed discussed further below, 

onto a first QoS queue and packets associated with another Logical View of Exemplary Switch Processing 

traffic group may be placed onto a second QoS queue. When F IG. IB is a logical view of the interaction between 

a port is ready to transmit the next packet, a scheduling exemplary switch processing blocks that may be distributed 

mechanism may be employed to select which QoS queue of 5S throughout the switch 100. For example, some of the pro- 

the QoS queues associated with the port will provide the cessing may be performed by functional units within the 

next packet for transmission. ports of the switch and other processing may be performed 

In the following description, for the purposes of by the CPU 130 or by the address filter/switch matrix/buffer 

explanation, numerous specific details arc set forth in order manager 115. In any event, the processing can be concep- 

to provide a thorough understanding of the present inven- 60 tually divided into a first group of functions 160 dedicated 

lion. It will be apparent, however, to one skilled in the art to input processing and a second group of functions 185 

that the present invention may be practiced without some of dedicated to output processing. According to the present 

these specific details. In other instances, well-known struc- embodiment, the first group 160 includes a comparison 

hires and devices are shown in block diagram form. engine 155, an enqueue block 161, a packet classification 

The present invention includes various steps, which will 65 block 150, and a buffer manager 165. The second group 185 

be described below. The steps of the present invention may includes a dequeue block 162, a Quality of Service (QoS) 

be performed by hardware components or may be embodied category evaluation block 175, and a scheduler 170. 
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Additionally, a user interface (UI) 145 may be provided 
for receiving various parameters from the network manager. 
The UI may be text based or graphical. In one embodiment, 
the UI 145 may include an in-band HyperText Markup 
Language (HTML) browser-based management tool which 
may be accessed by any standard web browser. In any event, 
the goal of the UI 145 is to separate high-level policy 
components, such as traffic grouping and QoS profiles from 
the details of the internal switch hardware. Thus, user 
configuration time is minimized and a consistent interface is 
provided to the user. 

The UI 145 receives information indicative of one or 
more traffic groups. This information may be provided by 
the network manager. There are several ways to define a 
traffic group. Table 1 below illustrates a variety of traffic 
classification schemes that may be supported by the UI 145. 



TABLE 1 




Traffic Classification 




Policy Based Upon 


Traffic Group Definition 


OSI Layer 


Applications 


TCP Session 


Transport Layer 




UDP Session 






RSVP How 




Network Layer 


Network Layer Protocol 


Network Layer 


Topology oi Groups of 


Subnet or CP Address 




Users 


VLAN Identifier 




End-Station Applications 


MAC Address 


Link Layer 




802.1p or 802.1Q 




Physical Topology 


Physical Port 


Physical Layer 



The information used to identify a traffic group typically 
depends upon what terms the QoS policy is defined. If the 
QoS policy is based on applications, traffic groups may be 
differentiated at the Transport layer by Transmission Control 
Protocol (TCP) session or User Datagram Protocol (UDP) 
session. For example, the network manager may provide 
information indicative of TCP source and destination ports 
and IP source and destination addresses to identify traffic 
groups. However, if the QoS policy is based upon the 
Network layer topology or groups of users, traffic group 
definition may be more convenient by supplying informa- 
tion regarding the Network layer protocol, such as Internet 
Protocol (IP) or Internetwork Packet Exchange (I p X), the 
subnet or IP addresses, or VLAN identifiers. If the QoS 



15 



20 



25 



30 



35 



policy is defined by end-station applications, then Media 

Access Control (MAC) addresses, IEEE 802.1p priority 45 end signaling protocol needs to be implemented by the 



A number of QoS queues 180 may be provided at each of 
the ports of a packet forwarding device. Id one embodiment, 
a mapping of traffic groups to QoS queues 180 may be 
maintained. As traffic groups are provided by the network 
manager, the UI 145 updates the local mapping of traffic 
groups to QoS queues 180. This mapping process may be a 
one-to-one mapping of the traffic groups defined by the 
network manager to the QoS queues 180 or the mapping 
process may be more involved. For example, there may be 
more traffic groups than QoS queues 180, in which case, 
more than one traffic group will be mapped to a single QoS 
queue. Some consolidation rules for combining multiple 
traffic groups into a single QoS queue will be discussed 
below. 

At any rate, by providing a layer of abstraction in this 
manner, the network manager need not be burdened with the 
underlying implementation details, such as the number of 
QoS queues per port and other queuing parameters. Another 
advantage achieved by this layer of abstraction between the 
traffic group definitions and the physical QoS queues is the 
fact that the UI 145 is now decoupled from the underlying 
implementation. Therefore, the UI 145 need not be updated 
if the hardware QoS implementation changes. For example, 
software providing for traffic group definition need not be 
changed simply because the number of QoS queues per port 
provided by the hardware changes. 

The input data stream is received by the comparison 
engine 155 from input switch ports (not shown). Under the 
direction of the packet classification process 150, the com- 
parison engine 155 determines with which of the previously 
defined traffic groups a packet in the data stream is associ- 
ated. The packet classification block 150 may employ the 
traffic group indications provided by the network manager to 
provide the comparison engine 155 with information regard- 
ing locations and fields to be compared or ignored within the 
header of a received packet, for example. It should be 
appreciated if the comparison required for traffic classifica- 
tion is straightforward, such as in a conventional packet 
forwarding device, then the comparison engine 155 and the 
packet classification block 150 may be combined. 

The packet classification block 150 in conjunction with 
the UI 145 provide a network manager with a flexible 
mechanism to control traffic prioritization and bandwidth 
allocation through the switch 100. Importantly, no end-to- 



indicalions, or IEEE 802.1 Q frames may be employed to 
identify traffic groups. Finally, if the QoS policy is physical 
topology based, physical port identifiers may be used to 
differentiate traffic groups. 

It should be noted that Table 1 merely presents an 50 
exemplary set of traffic group identification mechanisms. 
From the examples presented herein, additional, alternative, 
and equivalent traffic grouping schemes and policy consid- 
erations will be apparent to those of ordinary skill in the art. 
For example, other state information may be useful for 55 
purposes of packet classification, such as the history of 
previous packets, the previous traffic load, the time of day, 
etc. 

It is appreciated that traffic classifications based upon the 
traffic group definitions listed above may result in overlap. 60 
Should the network manager define overlapping traffic 
groups, the UI 145 may issue an error message and reject the 
most recent traffic group definition, the UI 145 may issue a 
warning message to the network manager and allow the 
more specific traffic group definition to override a conflict- 65 
ing general traffic group definition, or the UI 145 may be 
configured to respond in another manner. 



network devices. For example, the end -station that is to 
receive the data stream need not reserve bandwidth on each 
of the intermediate devices between it and the source of the 
data stream. Rather, a packet forwarding device employing 
the present invention can provide some benefit to the net- 
work without requiring routers and/or end-stations to do 
anything in particular to identify traffic. Thus, traffic priority 
may be enforced by the switch 100 and QoS may be 
delivered to applications without altering routers or end- 
stations. 

According to one embodiment, the buffer manager 165 
participates in policy based QoS by controlling the alloca- 
tion of buffers within the packet RAM 125. Buffers may be 
dynamically allocated to QoS queues 180 as needed, within 
constraints established by QoS profile attributes, which are 
discussed below. The buffer manager 165 may maintain 
several programmable variables for each QoS queue. For 
example, a Minimum Buffer Allocation and a Maximum 
Queue Depth may be provided for each QoS queue. The 
Minimum Buffer Allocation essentially reserves some mini- 
mum number of buffers in the packet RAM 125 for the QoS 
queue with which it is associated. The Maximum Queue 
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Depth establishes the maximum number of buffers that can Several queuing mechanisms may be implemented using 

be placed on a given QoS queue. The buffer manager 165 one or more of the following parameters associated with a 

also maintains a Current Queue Depth for each QoS queue traffic group: (1) minimum bandwidth, (2) maximum 

to assure the maximum depth is not exceeded. For example, bandwidth, (3) peak bandwidth, (4) maximum delay, and (5) 

before allowing a buffer to be added to a given QoS queue, 5 rc ] a tive priority. In general, the minimum, maximum, and 

the buffer manager 165 may compare the Maximum Queue peak bandwidth parameter may be expressed in Mbps, a 

Depth to the Current Queue Depth to ensure the Maximum percentage of total bandwidth, or any other convenient 

Queue Depth is not exceeded. representation. 

Variables arc also maintained for tracking free ^buffers in Minimum bandwidth indicates the minimum amount of 
the packet _RAM 125. At miration, a Buffers Free Coun 5andwidth a particular traffic group nccds t0 bc provided 
contains the total number of buffers available in the packet , c v , 4 . . , „ 4 / c . . 
RAM 125 and a Buffers Reserved Count contains the sum of ° ve ' * ^ fi ~ d J" P enod " If * h 5 ™ , of ^ "J™™! 
the minimum buffer allocations for the QoS queues 180. As bandwidfe for aU traffic groups defined is less than 100% of 
packets arc received they arc stored in free buffers, and the ^ available bandwidth, then the scheduling processing, 
Buffers Free Count is decremented by the number of buffers dossed below, can assure that each traffic group will 
used for such storage. After the appropriate QoS queue has 15 receive at least the minimum bandwidth requested, 
been identified the buffer manager 165 instructs the enqueue Maximum bandwidth is the maximum sustained band- 
block 161 to add the packet to the QoS queue. The enqueue width the traffic group can realize over a defined time period, 
block 161 links the packet to the identified queue provided In contrast, peak bandwidth represents the bandwidth a 
that the Current Queue Depth is less than the Maximum traffic group may utilize during a particular time interval in 
Queue Depth and either (1) the Current Queue Depth is less 20 excess of the maximum bandwidth. The peak bandwidth 
than the Minimum Buffer Allocation or (2) the Buffers parameter may be used to limit traffic bursts for the traffic 
Reserved Count is less than the Buffers Free Count. group with which it is associated. The peak bandwidth also 
Therefore, if a QoS queue exceeds its reserve of buffers determines how quickly the traffic group's current band- 
(e.g., Minimum Buffer Allocation), to the extent that addi- width will converge to the maximum bandwidth. By pro- 
tional buffers remain free, the QoS queue may continue to 25 viding a peak bandwidth value that is much higher than the 
grow. Otherwise, the enqueue block 161 will discard the maximum bandwidth, if sufficient bandwidth is available, 
packet, the buffers are returned to the free pool, and the the maximum bandwidth will be achieved relatively quickly. 
Buffers Free Count is increased by the number of buffers that In contrast, a peak bandwidth that is only slightly higher that 
would have been consumed by the packet. When a packet is the maximum bandwidth will cause the convergence to the 
successfully linked to a QoS queue, the Current Queue 30 maximum bandwidth to be more gradual. 
Depth for that QoS queue is increased by the number of Maximum delay specifies a time period beyond which 
buffers used by the packet. If, prior to the addition of the further delay cannot be tolerated for the particular traffic 
packet to the queue, the Current Queue Depth was less than group. Packets comprising the traffic group that are for- 
the Minimum Buffer Allocation then the Buffers Reserved warded by the switch 100 are guaranteed not to be delayed 
Count is decreased by the lesser of (1) the number of buffers 35 by more than the maximum delay specified, 
in the packet or (2) the difference between the Current Relative priority defines the relative importance of a 
Queue Depth and the Minimum Buffer Allocation. particular traffic group with respect to other traffic groups. 

The QoS category evaluation process 175 separates the As will be discussed further below, within the same QoS 

QoS queues into a plurality of categories based upon a set of category, traffic groups with a higher priority are preferred 

bandwidth parameters. The scheduler 170 uses the grouping 40 over those with lower priorities. 

provided by the QoS category evaluation process 175 to This small set of parameters in combination with the 
select an appropriate QoS queue for sourcing the next packet variety of traffic classification schemes gives a network 
for a particular port. The evaluation of QoS queue categories manager enormous control and flexibility in prioritizing and 
may be performed periodically or upon command by the managing traffic flowing through packet forwarding devices 
scheduler 170, for example. Periodic evaluation of QoS 45 in a network. For example, the QoS profile of a video traffic 
categories and scheduling is discussed in further detail group, identified by UDP session, might be defined to have 
below. a high priority and a minimum bandwidth of 5 Mbps, while 
Responsive to the scheduler 170 the dequeue block 162 the QoS profile of an engineering traffic group, identified by 
retrieves a packet from a specified QoS queue. After the VLAN, may be set to a second priority, a minimum band- 
packet has been transmitted, the buffer variables are so width of 30 Mbps, a maximum bandwidth of 50 Mbps, and 
updated. The Buffers Free Count is increased and the a peak bandwidth of 60 Mbps. Concurrently, the QoS profile 
Current Queue Depth is decreased by the number of buffers of a World Wide Web (WWW) traffic group, identified by 
utilized to store the packet. If the resulting Current Queue protocol (e.g., IP), may be set to have a low priority, a 
Depth is less than the Minimum Buffer Allocation, then the minimum bandwidth of 0 Mbps, a maximum bandwidth of 
Buffers Reserved Count is increased by the lesser of the 55 100%, and a peak bandwidth of 100%. 
number of buffers utilized to store the packet or the differ- Consolidation Rules 

ence between the Current Queue Depth and the Minimum It was mentioned earlier that multiple traffic groups may 

Buffer Allocation. be mapped to a single QoS queue. This may be accom- 

QoS Profile Attributes plished by maintaining an independent set of variables (e.g., 

Setting QoS policy is a combination of identifying traffic 60 minimum bandwidth, maximum bandwidth, peak 

groups and defining QoS profiles for those traffic groups. bandwidth, maximum delay, and relative priority) for each 

According to one embodiment, each individual traffic group QoS queue in addition to those already associated with each 

may be associated with a QoS profile. However, in alterna- traffic group and following the general consolidation rules 

live embodiments, multiple traffic groups may share a com- outlined below. 

mon QoS profile. Having described traffic group classifica- 65 Briefly, when the mapping from traffic groups to QoS 

tion and identification above, QoS profile attributes (also queues is one-to-one, the determination of a particular QoS 

referred to as parameters) will now be discussed. queues' attributes is straightforward. The QoS queue's 
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attributes simply equal the traffic group's attributes. 
However, when combining multiple traffic groups that do 
not share a common QoS profile onto a single QoS queue, 
the following general consolidation rules arc suggested: (1) 
add minimum attributes of the traffic groups being combined 
to arrive at an appropriate minimum attribute for the target 
QoS queue (e.g., the QoS queue in which the traffic will be 
merged), (2) use the largest of maximum attributes to arrive 
at an appropriate value for a maximum attribute for the 
target QoS queue, and (3) avoid merging traffic groups that 
have different relative priorities. This last rule suggests the 
number of priority levels provided should be less than or 
equal to the number of QoS queues supported by the 
implementation to assure traffic groups with different pri- 
orities are not combined in the same QoS queue. 

Importantly, when a network manager has determined that 
multiple traffic groups wilt share a common QoS profile, the 
consolidation rules need not apply, as the network manager 
has already, in effect, manually consolidated the parameters. 
Bandwidth Management and Traffic Prioritization 

Having described an exemplary environment in which 
one embodiment of the present invention may be 
implemented, bandwidth management and traffic prioritiza- 
tion will now be described with reference to FIG. 2. FIG. 2 
is a flow diagram illustrating the high level bandwidth 25 
management and traffic prioritization processing according 
to one embodiment of the present invention. In this 
embodiment, at step 210, a manager-defined QoS policy 
may be received via the UI 145, for example. The QoS 
policy is a combination of traffic groups and QoS profile 30 
attributes corresponding to those traffic groups. 

At step 220, a packet is received by the switch 100. Before 
the packet can be placed onto a QoS queue for transmission, 
the traffic group to which the packet belongs is identified at 
step 230. Typically, information in the packet header, for 35 
example, can be compared to the traffic group criteria 
established by the network manager to identify the traffic 
group to which the packet belongs. This comparison or 
matching process may be achieved by programming filters 
in the switch 100 that allow classification of traffic. Accord- 40 
ing to one embodiment, the packet may be identified using 
the traffic group definitions listed in Table 1. 

At step 250, enqueue processing is performed. The packet 
is added to the rear of the appropriate QoS queue for the 
identified traffic group. Importantly, if a maximum delay has 
been assigned to the traffic group with which the packet is 
associated, then the packet should either be dropped or 
transmitted within the period specified. According to one 
embodiment, this may be accomplished by limiting the 
depth (also referred to as length) of the corresponding QoS 
queue. Given the minimum bandwidth of the QoS queue and 
the maximum delay the traffic group can withstand, a 
maximum depth for the QoS queue can be calculated. If the 
QoS queue length remains less than or equal to the maxi- 
mum length, then the packet is added to the QoS queue. 
However, if the QoS queue length would exceed the maxi- 
mum length by the addition, then the packet is dropped. 

At step 260, scheduling is performed. The scheduling/ 
dequeuing processing involves determining the appropriate 
QoS queue group, selecting the appropriate QoS queue 
within that QoS queue group, and removing the packet at the 
front of the selected QoS queue. This selected packet will be 
the next packet the port transmits. Scheduling will be 
discussed further below. 
Evaluation of QoS Categories 

According to one embodiment of the present invention, it 
is advantageous to divide the QoS queues into at least two 



categories. The categories may be defined based upon the 
maximum bandwidth, the minimum bandwidth, the peak 
bandwidth, and the "current bandwidth." The current band- 
width should not be mistaken for a bandwidth at an instant 
in time, rather the current bandwidth is a moving average 
that is updated periodically upon the expiration of a prede- 
termined time period. Empirical data suggests this predeter- 
mined time period should be on the order of ten packet 
times, wherein a packet time is the time required to transmit 
a packet. However, depending upon the environment and the 
nature of the traffic, a value in the range of one to one 
hundred packet times may be more suitable. 

The members of the first category ("Category A") are 
those QoS queues which have a current bandwidth that is 
below their peak bandwidth and below their minimum 
bandwidth. Members of the second category ("Category B") 
include those QoS queues that have a current bandwidth that 
is greater than or equal to tbeir minimum bandwidth, but less 
than both their maximum bandwidth and their peak band- 
20 width. The remaining QoS queues (e.g., those having a 
current bandwidth that is greater than or equal to either the 
peak bandwidth or the maximum bandwidth) are ineligible 
for transmission. These QoS queues that are ineligible for 
transmission can be considered a third category ("Category 
C"). With this overview of QoS categories, an exemplary 
process for periodic evaluation of QoS categories will now 
be described. 

FIG. 3 is a flow diagram illustrating periodic evaluation of 
QoS categories according to one embodiment of the present 
invention. In this embodiment, at step 310, processing loops 
until the predetermined evaluation time period has expired. 
For example, a test may be performed to determine if the 
current time is greater than or equal to the last evaluation 
time plus the predetermined evaluation time interval. 
Alternatively, the evaluation process may be triggered by an 
interrupt. In any event, when it is time to evaluate the QoS 
queue categorization, processing continues with step 320. 

It will be appreciated that the time interval chosen for the 
predetermined evaluation time period should not be too long 
or too short. If the time interval is too long, one QoS queue 
might be allowed to monopolize the link until its maximum 
bandwidth is achieved while other QoS queues remain idle. 
If the time interval is too short, transmitting a single packet 
or remaining idle for a single packet lime may cause the QoS 
queue to become a member of a different QoS category (e.g., 
the single transmission may cause the current bandwidth to 
exceed the maximum bandwidth or the single idle time may 
cause the current bandwidth to fall below the minimum 
bandwidth) because the moving average moves very quickly 
over short time intervals. 

At step 330, the current bandwidth for a particular QoS 
queue is set to the current bandwidth for that QoS queue as 
calculated in the previous time interval multiplied by a first 
weighting factor plus the actual bandwidth that particular 
QoS queue received in the previous time interval multiplied 
by a second weighting factor, wherein the weighting factors 
may be selected to achieve the desired level of responsive- 
ness in the current bandwidth metric. For example, it may be 
desirable to have the current bandwidth converge to within 
a certain percentage of a sustained bandwidth if that band- 
width has been sustained for a certain amount of time. 
Exemplary weighting factors are in the form (w-l)/w and 
1/w, respectively. Using weighting factors of 15/16 for the 
first weighting factor and a value of 1/16 for the second 
weighting factor, for example, the current bandwidth will 
reflect 50% of a step within 13 time intervals, 80% of a step 
within 27 time intervals, and will be within 2% of the 
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sustained bandwidth in approximately 63 time intervals 
(assuming a maximum and peak bandwidth of 100%). 
Alternative ratios and current bandwidth metrics will be 
apparent to those of ordinary skill in the art. 

After the current bandwidth has been evaluated for a QoS 
queue, at step 340, the QoS queue bandwidth parameters can 
be compared to the current bandwidth to determine to which 
QoS category the QoS queue belongs. As described above, 
if (CURR_BW<PEAIC_BW) and (CURR_BW<MIN_ 



BW), then the QoS queue is associated with Category A at 410 niTiCtrated 

step 350. If (CURR_BW^MIN_BW) and ((CURR_ 10 ^ Cp 41 °' " , lUustratcd - 



evaluated, the QoS queues having the same priority will be 
rotated through in a predetermined order or scheduled such 
that the QoS queue that has not provided a packet for 
transmission recently will be given such an opportunity- 
After selecting a QoS queue in this manner, processing 
continues with step 470. 

At step 470, a packet is dequeued from the selected QoS 
queue and the packet is transmitted by the port at step 480. 
This scheduling process may be repeated by looping back to 



15 



BW<MAX_BW) and (CURR_BW<PEAK_BW)), then 
the QoS queue is associated with Category B at step 360. If 
(CURR_BW^PEAK_BW) or (CURR_BW^MAX_ 
BW), then the QoS queue is associated with Category C at 
step 370. 

At step 380, if all of the QoS queues have been evaluated, 
then processing branches to step 310; otherwise, processing 
continues with step 330. 
Scheduling Processing 

Briefly, at each port, three levels of arbitration may be 20 
employed to select the appropriate QoS queue from which to 
transmit the next packet. The flrst level of arbitration selects 
among the QoS categories. Category A is given priority if 
any member QoS queues have one or more pending packets. 
Otherwise, a QoS queue with one or more pending packets 25 
of Category B is selected. According to one embodiment, the 
relative priority assigned to each QoS queue may be used as 
a second level of arbitration. In this manner, when multiple 
QoS queues satisfy the first level arbitration, a higher 
priority QoS queue is favored over a lower priority QoS 
queue. Finally, when there is a tie at the second level of 
arbitration (e.g., two or more QoS queues in the same QoS 
category have the same relative priority), a round robin or 
least recently used (LRU) scheme may be employed to 
select from among the two or more QoS queues until the 
QoS categories are evaluated. 

Assuming a periodic evaluation of QoS categories is 
being performed, the scheduling processing need not include 
such evaluation and the scheduling processing may be 



Oueuing Schemes 

A variety of different queuing mechanisms may be imple- 
mented using various combinations of the QoS profile 
attributes discussed above. Table 2 below illustrates how to 
achieve exemplary queuing mechanisms and corresponding 
configurations of the QoS profile attributes. 

TABLE 2 

Queuing Mechanism Configurations 
Queuing Mechanism QoS Profile Attribute Value 
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Strict Priority Queuing 


Minimum Bandwidth - 


0% 




Maximum Bandwidth - 


100% 




Peak Bandwidth - 


100% 




Maximum Delay - 


N/A 




Relative Priority - 


PRioRnr, 


Round Robin/ 


Minimum Bandwidth - 


0% 


Least Recently Used 


Maximum Bandwidth - 


100% 


Queuing 


Peak Bandwidth - 


100% 




Maximum Delay - 


N/A 




Relative Priority - 


<samc for all 






queues> 


Weighted Fair Queuing 


Minimum Bandwidth - 


>0% 




Maximum Bandwidth - 


MAX_BW t 




Peal Bandwidth - 


PEAK^BW, 




Maximum Delay - 


N/A 




Relative Priority - 


<same for all 






queues> 



PRIORITY,- represents a programmable priority value for 
a particular QoS queue, i. Similarly, MAX_BW f and 
performed as illustrated by FIG. 4, according to one embodi- 40 PEAK_BW ( . represent programmable maximum band- 



ment of the present invention. In the embodiment depicted, 
at step 410, processing loops until the port associated with 
the group of QoS queues being evaluated indicates it is ready 
to receive the next packet for transmission. For example, the 



widths and peak bandwidths, respectively, for a particular 
QoS queue, i. 

For a strict priority scheme, each QoS queue's minimum 
bandwidth is set to zero percent, each QoS queue's maxi- 



port may be polled to determine its transmission status. 45 mum bandwidth is set to one hundred percent, and each QoS 



Alternatively, the scheduling process may be triggered by an 
interrupt. In any event, when the port is ready for the next 
packet, processing continues with step 420. 
At step 420, a QoS category is selected from which a QoS 



queue's peak is set to one hundred percent. In this manner, 
the current bandwidth will never be less than the minimum 
bandwidth, and the current bandwidth will never exceed 
either the peak bandwidth or the maximum bandwidth. In 



queue will provide the next packet for transmission. As 50 this configuration, all QoS queues will be associated with 



described above, priority is given to the category containing 
QoS queues with pending data that are below the peak 
bandwidth and minimum bandwidth (e.g., Category A). 
However, if no QoS queues meet this criteria, Category B is 
selected. 

At step 430, if multiple QoS queues are members of the 
selected QoS category, processing continues with step 440; 
otherwise, processing branches to step 470. 

At step 440, the relative priorities of the QoS queues are 



Category B since no QoS queues will satisfy the criteria of 
either Category A or Category B. Ultimately, by configuring 
the QoS profile attributes in this manner, the second level of 
arbitration (e.g., the relative priority of the QoS queues) 
55 determines which QoS queue is to source the next packet. 
For a pure round robin or least recently used (LRU) 
scheme, the QoS profile attributes are as above, but addi- 
tionally all QoS queue priorities are set to the same value. In 
this manner, the third level of arbitration determines which 



used to select among the QoS queues of the selected 60 QoS queue is to source the next packet. 



category that have pending data. 

At step 450, if two or more QoS queues have the same 
priority, then processing continues with step 460. Otherwise, 
if a QoS queue is found to have the highest relative priority, 
then processing branches to step 470. 

At step 460, the tie is resolved by performing round robin 
or LRU scheduling. That is, until the QoS categories are 



Finally, weighted fair queuing can be achieved by 
assigning, at least, a value greater than zero percent to the 
desired minimum bandwidth. By assigning a value greater 
than zero to the minimum bandwidth parameter, the partial - 
65 lar QoS queue is assured to get at least that amount of 
bandwidth on average because the QoS queue will be 
associated with Category A until at least its minimum 
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bandwidth is satisfied. Additionally, different combinations 
of values may be assigned to the peak and maximum 
bandwidths to prevent a particular QoS queue from monopo- 
lizing the link. 
Alternative Embodiments 

While evaluation of QoS categories has been described 
above as occurring periodically, this evaluation may also be 
triggered by the occurrence of a predetermined event. 
Alternatively, evaluation of QoS categories may take place 
as part of the scheduling processing rather than as part of a 
separate periodic background process. 

While a relationship between the number of priority levels 
and the number of QoS queues has been suggested above, it 
is appreciated that the number of QoS queues may be 
determined independently of the number of priority levels. 



10 



5. The method of claim 1, wherein the one or more 
bandwidth parameters include an indication regarding a 
maximum sustained bandwidth the at least one traffic group 
can realize over a defined time period. 

6. The method of claim 5, wherein the one or more 
bandwidth parameters include an indication regarding a 
peak bandwidth representing a bandwidth the at least one 
traffic group may utilize during a particular time interval in 
excess of the maximum bandwidth. 

7. The method of claim 1, further comprising: classifying 
the packet as being associated with the at least one traffic 
group; and determining a quality of service queue with 
which the at least one traffic group is associated. 

8. The method of claim 1, wherein QoS profile attributes 
associated with each of the one or more traffic groups 



Further, it is appreciated that the number of QoS queues 15 include a maximum delay, specifying a time period beyond 

provided at each port may be fixed for every port or which further delay cannot be tolerated for the particular 

alternatively a variable number of QoS queues may be traffic group. 

provided for each port. 9 ^ method of claim X > wherem the other QoS profile 

Finally, in alternative embodiments, weighting factors attributes associated with each of the one or more traffic 

and ratios other than those suggested herein may be used to 20 S^P 5 inclu ^ * Re^vc Priority, defining the relative 

adjust the current bandwidth calculation for a particular importance of a particular traffic group with respect to other 

implementation. traffic 

In the foregoing specification, the invention has been . 10 A mc ' hod of bandwidth management and traffic pa- 

described with reference to specific embodiments thereof. It antaatwn for use in a network of devices, the method 

will, however, be evident that various modifications and 25 comprising the steps of: 
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changes may be made thereto without departing from the 
broader spirit and scope of the invention. The specification 
and drawings are, accordingly, to be regarded in an illus- 
trative rather than a restrictive sense. 
What is claimed is: 

1. A method comprising: 

receiving at a packet forwarding device information 
indicative of one or more traffic groups; 

receiving at the packet forwarding device one or more 
bandwidth parameters for at least one of the one or 
more traffic groups; 

receiving at a first port of a plurality of ports a packet 
associated with the at least one traffic group; 

enqueuing the packet onto a queue associated with the at 40 
least one traffic group; 

scheduling the packet for transmission from a second port 
of the plurality of ports based upon the one or more 
bandwidth parameters for the at least one traffic group 
with which the packet is associated; by 45 
periodically evaluating a current bandwidth metric for 
the queue; by 

determining an actual bandwidth for a prior time 
period; 

determining a bandwidth metric for the prior time 50 
period; and 

combining a portion of the actual bandwidth for the 
prior time period with a portion of the bandwidth 
metric for the prior time period to arrive at the 
current bandwidth metric; and 
dequeuing the packet from the queue if the current 
bandwidth metric meets a predetermined relationship 
with the one or more bandwidth parameters. 

2. The method of claim 1, wherein the network employs 
a non-deterministic access protocol. 

3. The method of claim 2, wherein the non-deterministic 
access protocol is Carrier Sense Multiple Access with Col- 
lision Detection (CSMA/CD). 

4. The method of claim 1, wherein the one or more 
bandwidth parameters include an indication regarding a 65 
minimum amount of bandwidth the at least one traffic group 
needs to be provided over a defined time period. 



defining at a packet forwarding device information indica- 
tive of one or more traffic groups; 
defining at the packet forwarding device information 
indicative of a quality of service (QoS) policy for at 
least one of the one or more traffic groups; 
receiving a packet at a first port of a plurality of ports; 
identifying a first traffic group of the one or more traffic 

groups with which the packet is associated; 
providing a plurality of QoS queues; 
mapping the first traffic group to a first QoS queue of the 

plurality of QoS queues; and 
scheduling the packet for transmission from a second port 
of the plurality of ports based upon the QoS policy for 
the first traffic group, and wherein the scheduling is 
independent of end-to-end signaling; said scheduling 
including: 

determining a current bandwidth metric for each of the 

plurality of QoS queues; 
dividing the plurality of QoS queues into at least a first 
group and a second group based upon the current 
bandwidth metrics and a minimum bandwidth 
requirement associated with each of the plurality of 
QoS queues; and 
if the first group includes at least one QoS queue, then 
transmitting a packet from the at least one QoS 
queue; otherwise transmitting a packet from a QoS 
queue associated with the second group. 

11. The method of claim 10, wherein the network of 
55 devices employs a non -deterministic access protocol. 

12. The method of claim 11, wherein the non- 
deterministic access protocol is Carrier Sense Multiple 
Access with Collision Detection (CSMA/CD). 

13. A method of bandwidth management and traffic pri- 
oo oritization for use in a network of devices, the method 

comprising: 

receiving at a packet forwarding device information 
indicative of one or more traffic groups, the information 
indicative of the one or more traffic groups including 
Internet Protocol (IP) subnet membership; 
receiving at the packet forwarding device information 
defining a quality of service (QoS) policy for at least 
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one of the one or more traffic groups, the QoS policy 
including at least a minimum bandwidth; 
providing a plurality of queues at each of a plurality of 
output ports; 

associating the one or more traffic groups with the plu- 
rality of queues based upon the minimum bandwidth; 
and 

scheduling a packet for transmission from one of the 
plurality of queues onto the network. 

14. A method of bandwidth management and traffic pri- 
oritization for use in a network of devices, the method 
comprising: 

providing a plurality of quality of service (QoS) queues at 
each of a plurality of output ports, each of the plurality 
of QoS queues associated with a minimum queue 
bandwidth requirement; 
adding a packet to one of the plurality of QoS queues 
based upon a traffic group with which the packet is 
associated; and 
scheduling a next packet for transmission onto the net- 
work from one of the plurality of QoS queues at a 
particular output port of the plurality of output ports by: 
determining a current bandwidth metric for each of the 

plurality of QoS queues, 
dividing the plurality of QoS queues into at least a first 
group and a second group based upon the current 
bandwidth metrics and the minimum queue band- 
width requirements, and 
if at least one QoS queue of the plurality of QoS 
queues, so divided, is associated with the first group, 
then transmitting a packet from the at least one QoS 
queue; otherwise transmitting a packet from a QoS 
queue of the plurality of QoS queues associated with 
the second group. 

15. The method of claim 14, wherein the current band- 
width for a particular QoS queue is calculated as follows: 

CURR_BW r W 1 xCURR^BW^WZx ACT_BW f ; 

where: 

CURR_BW / represents the current bandwidth for a par- 
ticular QoS queue, 
Wl represents a first weighting factor, 
W2 represents a second weighting factor, and 
ACT_BW, represents the actual bandwidth received by 
the particular QoS queue in a previous time interval. 

16. The method of claim 15, wherein W1-(W-1)/W, 
W2«*l/W, and the previous time interval is the most recent 
time interval. 

17. The method of claim 14, further comprising selecting 
an appropriate QoS queue, from which to transmit the next 
packet, from the first or the second group based upon 
relative queue priorities associated with the QoS queues. 

18. The method of claim 14, further comprising selecting 55 
an appropriate QoS queue, from which to transmit the next 
packet, from the first or the second group based upon a round 
robin selection scheme. 

19. The method of claim 14, further comprising selecting 
the appropriate QoS queue, from which to transmit the next so 
packet, from the first or the second group based upon a least 
recently used (LRU) selection scheme. 

20. The method of claim 14, wherein the first group 
comprises QoS queues associated with a minimum queue 
bandwidth requirement that is less than the QoS queues' 65 
current bandwidth metric, and wherein the second group 
comprises QoS queues associated with a minimum queue 
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bandwidth requirement that is greater than or equal to the 
QoS queues' current bandwidth metric. 

21. A packet forwarding device for use in a network 
employing a non-deterministic assess protocol, the packet 
forwarding device comprising: 

a filtering and forwarding engine configured to forward 
received packets based upon a traffic group with which 
the packet is associated; and 

a plurality of ports coupled to the filtering and forwarding 
engine, each port of the plurality of ports configured to 
receive packets from the filtering and forwarding 
engine, each port of the plurality of ports having a 
plurality of Quality of Service (QoS) queues associated 
with a minimum queue bandwidth requirement, each 
port of the plurality of ports further configured to 
schedule a packet for transmission onto the network by 
determining a current bandwidth metric for each of the 
plurality of QoS queues, 

dividing the plurality of QoS queues into at least a first 
group and a second group based upon the current 
bandwidth metrics and the minimum queue bandwidth 
requirements, and 

if at least one QoS queue of the plurality of QoS queues, 
so divided, is associated with the first group, then 
transmitting a packet from the at least one QoS queue; 
otherwise transmitting a packet from a QoS queue of 
the plurality of QoS queues associated with the second 
group. 

22. The packet forwarding device of claim 21, wherein the 
plurality of ports are further configured to select among QoS 
queues in the same group based upon relative queue priori- 
ties associated with the QoS queues. 

23. The packet forwarding device of claim 21, wherein the 
plurality of ports are further configured to select among QoS 
queues in the same group based upon a round robin selection 
scheme. 

24. The packet forwarding device of claim 21, wherein the 
plurality of ports are further configured to select among QoS 
queues in the same group based upon a least recently used 
(LRU) selection scheme. 

25. The packet forwarding device of claim 21, wherein the 
first group comprises QoS queues associated with a mini- 
mum queue bandwidth requirement that is less than the QoS 
queues' current bandwidth metric, and wherein the second 
group comprises QoS queues associated with a minimum 
queue bandwidth requirement that is greater than or equal to 
the QoS queues' current bandwidth metric. 

26. A method of bandwidth management and traffic pri- 
oritization for use in a network of devices, the method 
comprising: 

receiving at a packet forwarding device information 
indicative of one or more traffic groups, the information 
indicative of the one or more traffic groups including a 
virtual local area network (VLAN) identifier; 

receiving at the packet forwarding device information 
defining a quality of service (QoS) policy for at least 
one of the one or more traffic groups, the QoS policy 
including at least a minimum bandwidth; 

providing a plurality of queues at each of a plurality of 
output ports; 

associating the one or more traffic groups with the plu- 
rality of queues based upon the minimum bandwidth; 
and 

scheduling a packet for transmission from one of the 
plurality of queues onto the network. 

27. A machine-readable medium having stored thereon 
data representing sequences of instructions, said sequences 
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of instructions which, when executed by a processor, cause 
said processor to: 

receive at a packet forwarding device information indica- 
tive of one or more traffic groups; 

receive at the packet forwarding device one or more 
bandwidth parameters for at least one of the one or 
more traffic groups; 

receive at a first port of a plurality of ports a packet 
associated with the at least one traffic group; 

enqueue the packet onto a queue associated with the at 
least one traffic group; 

schedule the packet for transmission from a second port of 
the plurality of ports based upon the one or more 
bandwidth parameters for the at least one traffic group 
with which the packet is associated; by 
periodically evaluating a current bandwidth metric for 
the queue; by 

determining an actual bandwidth for a prior time 
period; 

determining a bandwidth metric for the prior time 
period; and 

combining a portion of the actual bandwidth for the 
prior time period with a portion of the bandwidth 
metric for the prior time period to arrive at the 
current bandwidth metric; and 
dequeuing the packet from the queue if the current 
bandwidth metric meets a predetermined relationship 
with the one or more bandwidth parameters. 
28. A machine-readable medium having stored thereon 
data representing sequences of instructions, said sequences 
of instructions which, when executed by a processor, cause 
said processor to: 

define at a packet forwarding device information indica- 
tive of one or more traffic groups; 
define at the packet forwarding device information indica- 
tive of a quality of service (QoS) policy for at least one 
of the one or more traffic groups; 
receive a packet at a first port of a plurality of ports; 
identify a first traffic group of the one or more traffic 

groups with which the packet is associated; 
provide a plurality of QoS queues; 
map the first traffic group to a first QoS queue of the 
plurality of QoS queues; and 
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schedule the packet for transmission from a second port of 
the plurality of ports based upon the QoS policy for the 
first traffic group, and wherein the scheduling is inde- 
pendent of end-to-end signaling; said scheduling 
including: 

determining a current bandwidth metric for each of the 

plurality of QoS queues; 
dividing the plurality of QoS queues into at least a first 
group and a second group based upon the current 
bandwidth metrics and a minimum bandwidth 
requirement associated with each of the plurality of 
QoS queues; and 
if the first group includes at least one QoS queue, then 
transmitting a packet from the at least one QoS 
queue; otherwise transmitting a packet from a QoS 
queue associated with the second group. 
29, A machine-readable medium having stored thereon 
data representing sequences of instructions, said sequences 
20 of instructions which, when executed by a processor, cause 
said processor to: 
provide a plurality of quality of service (QoS) queues at 
each of a plurality of output ports, each of the plurality 
of QoS queues associated with a minimum queue 
bandwidth requirement; 
add a packet to one of the plurality of QoS queues based 
upon a traffic group with which the packet is associated; 
and 

schedule a next packet for transmission onto the network 
from one of the plurality of QoS queues at a particular 
output port of the plurality of output ports by: 
determining a current bandwidth metric for each of the 

plurality of QoS queues, 
dividing the plurality of QoS queues into at least a first 
group and a second group based upon the current 
bandwidth metrics and the minimum queue band- 
width requirements, and 
if at least one QoS queue of the plurality of QoS 
queues, so divided, is associated with the first group, 
then transmitting a packet from the at least one QoS 
queue; otherwise transmitting a packet from a QoS 
queue of the plurality of QoS queues associated with 
the second group. 
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A method of controlling packet traffic in an IP network of 
originating, receiving and intermediate nodes to meet per- 
formance objectives established by service level agree- 
ments. Traffic statistics and performance data such as delay 
and loss rates relating to traffic flows are collected at 
intermediate nodes. A control server processes the collected 
data to determines data flow rates for different priorities of 
traffic. A static directory node is used to look up inter-node 
connections and determine initial traffic classes correspond- 
ing to those connections. The rates arc combined with the 
initial traffic classes to define codes for encoding the headers 
of packets to determine their network priority. 
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ARCHITECTURE FOR SUPPORTING 
SERVICE LEVEL AGREEMENTS IN AN IP 
NETWORK 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention generally relates to architectures for 
delivering networking products across multiple platforms, 
and in particular to architectures for delivering Internet 
Protocol (IP) networking products which are enabled to 
support service level agreements. 

2. Background Description 

The operators and users of enterprise networks prefer that 
their networks be predictable and provide consistent perfor- 
mance. Predictability and consistency are often more impor- 
tant than the raw capabilities of the network, i.e. a network 
that provides a consistent medium throughput is often con- 
sidered more desirable than a network which provides very 
high throughput at some times, but performs poorly at other 
times. For many business applications, it is important that 
transactions be completed in a predictable mariner while the 
time taken for the transactions to complete is relatively 
unimportant (provided it does not exceed a reasonable 
limit). 

Prior art solutions, such as SNA, provide network pre- 
dictability by prcconfiguring the network. This does not 
work in an IP network, because IP is dynamic and 
connectionless, and therefore relatively unpredictable. The 
typical enterprise network environment consists of several 
campus area networks interconnected by a wide area back- 
bone network. The campus networks usually deploy high- 
speed links, and perform reasonably well. Congestion tends 
to occur in the backbone network, which consists of rela- 
tively slower speed point-to-point links, and in some of the 
campus networks which house the servers. 

An approach is needed which will provide predictability 
on an IP backbone network, and do so for backbones with 
varying degrees of capability. If the network provider can 
predict the performance of the network, then he can imple- 
ment service level agreements. A service level agreement is 
a formal contract entered into by a service provider and its 
customers. The service provider contracts to transport pack- 
ets of electronic data between customer premise networks 
(branch offices, data centers, server farms, etc.) across the 
provider's backbone network with certain assurances on the 
quality of the transport. This is known as the Service Level 
Agreement (SLA). The SLA specifies customer expectations 
of performance in terms of parameters such as availability 
(bound on downtime), delay, loss, priority and bandwidth for 
specific traffic characteristics. An SLA includes acceptable 
levels of performance, which may be expressed in terms of 
response time, throughput, availability (such as 95% or 99% 
or 99.9%), and expected time to repair. 

SLAs vary greatly from one network to the next, and from 
one application to another running on the same network. 
They are normally based on some level of expected activity. 
For example, if a large airline wants to ensure that the lines 
at the ticket counter do not get overly long due to poor 
response time at the ticketing terminals, some estimate must 
be made of expected workload, so that the network admin- 
istrator can be prepared with the necessary resources to meet 
that workload and still remain compliant with the perfor- 
mance terms of the SLA. 

Managing an SLA is an important task because of the 
revenue implications of failure to support mission-critical 
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business applications. The problem is exacerbated due to 
diversity of the traffic and due to poor and varying degree of 
service differentiation mechanisms within the backbone 
networks. Commercially significant traffic must be priori- 

5 tized above workloads which do not have a critical time 
dependency for the success of the business. Many of these 
workloads in an IP environment are far more volatile than 
those which have traditionally been encountered in the prior 
art, e.g. in native SNA environments. In order to meet 

10 customer requirements in this environment, a service pro- 
vider must provide a large excess capacity at correspond- 
ingly high charges. 

This situation dramatizes the need for effective tools 
which can monitor the performance of the IP network or 

15 system delivering a service over the IP network. While SLA 
management tools already exist in the native SNA VTAM 
environment, these tools do not generally exist for IP back- 
bones. Also, there is a need for effective controls which 
allow the service provider of an IP network to manipulate 

20 the priority of the various workloads to be managed. 

SUMMARY OF THE INVENTION 

It is therefore an object of the present invention to provide 
support for service-level agreements in a corporate itinerant 
25 or an ISP-controlled portion of the Internet. 

It is also an object of the invention to provide tools which 
can monitor the performance of an IP network as measured 
against multiple SLA agreements. 

It is a further object of the invention to provide effective 
30 controls which allow the service provider to manipulate the 
priority of the various workloads subject to SLA agree- 
ments. 

Another object of the invention is to provide means for 

35 achieving network predictability which are adequate to 
implement a variety of SLA agreements over IP backbone 
networks halving a variety of capabilities. 

It is yet another object of the invention to provide network 
traffic control tools enabling optimum allocation of network 

4Q resources and minimizing the need to provide excess capac- 
ity in order to implement a variety of SLA agreements. 

This invention discloses an architecture (SLA 
architecture) which organizes the key components, the spe- 
cific function placements and communication mechanisms 

45 so as to enable efficient means of implementing new tools 
which greatly facilitate both development and enforcement 
of an SLA. Further, these advantages are even more signifi- 
cant when the backbone network such as current IP-based 
networks provide very little means for such service differ- 

50 entiation. 

The key components of a service level agreement are 
availability and responsiveness. Availability is maintained 
by managing network connectivity in the presence of 
failures, and responsiveness by maintaining a satisfactory 

55 Level of network performance. In an IP network, availability 
is largely taken care of by the adaptive routing mechanism 
used by IP, but responsiveness needs to be managed. The 
schemes that make the network predictable provide mecha- 
nisms that can estimate the responsiveness of an IP network, 

so and thereby assist in implementing service level agreements. 
The approach taken in accordance with the present invention 
to provide predictability in an IP network is to provide a 
quasi-static configuration which adapts to longer term fluc- 
tuations of traffic and relics upon the dynamism of IP to react 

65 properly to short term fluctuations and congestion. 

Quasi-static adaptations may be viewed as dynamic in 
relation to longer time scales. By extending the adaptive 
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time scales to relatively gross periods of hours, days and 
weeks, as appropriate, a quasi-static configuration enables 
the network to modify allocation of resources in such a 
manner as to lower the load on the network, in contrast to 
prior art techniques such as Response Reservation Protocol 5 
(RSVP) which allow necessary resources to be requested bit 
impose a higher signalling load on the network. 

The invention involves controlling packet traffic in an IP 
network of originating, receiving and intermediate nodes to 
meet performance objectives established by service level 10 
agreements. To implement the invention, traffic statistics and 
performance data such as delay and loss rates relating to 
traffic flows are collected at intermediate nodes. A central 
server processes the collected data to determines rates for 
different priorities of traffic. A static directory node is used I 5 
to look up inter-node connections and determine initial 
traffic classes corresponding to those connections. The rates 
are combined with the initial traffic classes to define codes 
for encoding the headers of packets to determine their 
network priority. 20 

BRIEF DESCRIPTION OF THE DRAWINGS 

The foregoing and other objects, aspects and advantages 
will be better understood from the following detailed 25 
description of a preferred embodiment of the invention with 
reference to the drawings, in which: 

FIG. la is a block diagram of a service level agreement 
scenario having a combined directory server and control 
server; FIG. lb shows the same scenario having a directory 30 
server distinct from the control server. 

FIG. 2 is an illustration of a hierarchical structure of a 
directory that can be used for Service Level Agreements. 

DETAILED DESCRIPTION OF A PREFERRED 35 
EMBODIMENT OF THE INVENTION 

Referring now to the drawings, and more particularly to 
FIG. la, there is shown a Service Level Agreement scenario, 
with a Network Operator 10, Customer Premise Networks ^ 
Al and A2, Edge Devices El and E2, a Network Router Rl 
and a Directory Server/Control Server 11. FIG. lb shows the 
same scenario, but with Directory Server 11A distinct from 
Control Server 11 B. 

The main components of the proposed architecture are: 45 

Edge Device(s), 

Control Servers), 

Directory Servers), 

Edge Device to Control Server Protocol, 5Q 

Edge Device to Directory Server Protocol, 

Control Server to Directory Server Protocol, and 

End Host Protocol. 
These will now be discussed in detail. 

An Edge Device in the SLA architecture is a module that 55 
interfaces a customer premise network with the backbone 
network. (Currently, backbone networks vary widely in their 
resource management and service differentiation capabilities 
(e.g. an IP network with support for resource reservation 
and/or support for differential services using weighted fair so 
queuing (WFQ) or class based queuing (CBQ), an ATM or 
Frame Relay network supporting switched virtual circuits 
with committed rates, etc). Such heterogeneity is expected to 
continue as vendors of networking equipment seek to dif- 
ferentiate their products. In such an environment, edge 65 
devices play the role of adapting the traffic entering the 
backbone network to the specific capabilities provided by 
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the network in order to ensure that the SLA conditions are 
met efficiently. 

An Edge Device may reside on a stand-alone processing 
device or be integrated into the border router that acts as a 
gateway to the service provider network. In either case, all 
packets originating in one customer premise network and 
destined for another pass through two Edge Devices com- 
ponents; i.e., the ingress Edge Device El at the interface 
between the backbone network and the source customer 
premise network Al, and the egress Edge Device E2 at the 
interface between the backbone network and the destination 
customer premise network A2. Note that customer premise 
networks Al and A2 are not transit networks. 

The ingress Edge Device El obtains some or all of the 
following information, either carried in the packets received 
from the customer premise network, or obtained through a 
lookup based on information stored at the edge device: 

ingress interface, 

source address, 

source port, 

destination address, 

destination port, 

protocol id, 

Class of Service identification, 

contents of packet, 

header fields in transport protocol. 

An ingress Edge Device El performs some or all of the 
following operations on packets that it intercepts as they 
leave a customer premise network Al and enter the back- 
bone network. 

1. Classification: Packets are categorized into separate 
streams based on a number of criteria that depend on the 
terms of SLA and the network capabilities. The Edge Device 
uses a set of classification rules to determine the appropriate 
service level category to which the packet is assigned. These 
rules may be configured in the Edge Device or obtained by 
querying a Directory Server. The details of the latter mode 
of operation will be discussed below in the context of the 
Directory Server operation. In a preferred implementation, 
only the egress edge device classification and class of 
service classification are necessary to provide service level 
agreements. For finer granularity control, the other classifi- 
cations (path, channel, flow) can also be used. 

(a) Egress Edge Device Classification: The ingress Edge 
Device El that receives the packet from the customer 
premise network Al obtains the identity of the remote 
or egress Edge Device E2 that the packet is expected to 
traverse before being delivered to the destination cus- 
tomer premise network A2, either directly from the 
packet or based on a lookup. 

(b) Path Classification: The ingress Edge Device El 
determines the path that is expected to be traversed 
across the backbone network by the packet. 

(c) Class of Service (classification: Packets with similar 
designated service categories are considered to belong 
to same stream. The class of service may be determined 
directly from information carried in the packet or may 
be based on other header fields carried in the packet, or 
based on a set of classification rules at the Edge Device. 

(d) Channel classification: A channel is defined as a 
stream of packets that have the same ingress and egress 
edge devices, that are expected to follow the same path 
through the network and have the same Class of 
Service. The present invention also covers the case 
where all packets expected to traverse the same remote 
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edge device are classified into the same channel, irre- resources requirement) of flows that will enter the back- 
spective of the expected path within the network. bones network from the Edge Device, 

(e) Flow classification: A flow is the basic packet stream 9. Performance monitoring: This includes estimating the 
unit over which a service level agreement may be bandwidth, delay and loss characteristics of selected flows, 

specified. Typically, all packets in a flow belong to the 5 This function will be realized either using additional probe 
same channel. packets or using header fields if data packets are encapsu- 

2. Packet Transformation: The edge-device is responsible lated before entering the backbones network. The frequency 
for changing packet formats so that the backbone network is of probing is adjusted according to the SLA terms while 
capable of mapping the transformed packet into the right maintaining a tight control over the overhead introduced by 
class for purposes of scheduling and buffer management. 10 such additional traffic. The latter is achieved by ensuring that 
Two common forms of transformation are 1) to change the the overhead of probing docs not exceed a certain percent- 
contents of the TOS (Type of Service; field in the IP header; age of the actual data traffic which is monitored by the 
or 2) to encapsulate the packet with an additional IP header. statistics collection function. Performance monitoring can 
When the backbone network supports a mechanism such as be done at the egress edge device only, or at a combination 
WFQ or CBQ which operates on the TOS byte field, the 15 of the ingress and egress edge devices. 

edge-device changes the contents of the TOS byte to a value 10. Policy control: This covers a variety of operations 
specific to the class of service assigned to a packet. When the performed at the edge of the network, including access 
backbone network supports a mechanism such as RSVP control, security, billing, etc. Network administrators may 
based reservation, or WFQ/CBQ which operate on the basis wish to allow or disallow the use of network resource based 
of port numbers in the transport header, an encapsulation 20 on the origin, destination or protocol used by the packet 
into an additional IP header is required. The external IP stream. In addition, policy control may involve authentica- 
hcadcr would contain the right fields which would permit the tion of applications and their desired service levels in an 
routers in the backbone network to classify it easily. environment where end-hosts are capable of signaling their 
Packet transformation may need to be done both at the resource requirements/service prior-ities directly to the net- 
entry and the egress edge-device for a packet. When only the 25 work. This function involves communication with a 
TOS byte value is changed, only the entry edge-device needs directory/policy server described below, 
to transform the packet. However, when an additional IP 2. Control Server 

header is used for encapsulation, the entry edge-device A control server in the SLA architecture is a module that 
transforms the packet by adding the external IP header, and acts as a repository of dynamic information (in accordance 

the egress edge-device transforms the packet by removing 30 with the above referenced "quasi-static" approach involving 
the external IP header and restoring the original packet. adaptive time scales), e.g. resource utilization within a 

3. Scheduling: Scheduling refers to the differential treat- portion of the backbone network. Based on the knowledge 
ment given to different flows in terms of access to link of the topology, resource utilization and service level agree- 
bandwidth. Typically, each outgoing flow at the edge de(vice ments with all customer premise networks, the control 
is assured of the opportunity to share available link band- 35 server computes the allocation of backbone network 
width fairly with other contending flows. In this context, it resources, and informs the edge devices of the pacing that 
becomes important for Edge Devices El or E2 to schedule must be done on various channels. To this end, the control 
packets awaiting acess to the link, sending them out in a server may perform some or all of the following functions: 
certain order, perhaps different from the order in which the 1. Network Topology Learning: The control server 
packets were received. Scheduling may also aggregate simi- 40 becomes aware of the topology and total resource availabil- 
lar flows and arbitrate amongst the aggregates. ity at network elements. These network elements may 

4. Buffer management: Another resource at the edge include various edge devices, routers, switches, bridges or 
device that needs to be shared is buffer space. Buffer links between other network elements. The resources at the 
management refers to the operations of the edge device to network elements may include bandwidth, packet process- 
assure a fair treatment of different flows in terms of their 45 ing speeds, buffers, and protocols to manage these resources, 
access to this resource, based on their priority and current The control server can obtain this information directly, i.e., 
usage of buffers. through participation in routing protocols, or network man- 

5. Policing: The SLA specifies the service levels that agement protocols or through configuration; or indirectly, 
individual flows should receive as long as the traffic gener- i.e. from the Directory server or edge devices. 

ated by these flows is within specified bounds. The policing 50 2. Updating Directory server: If the Control Server 
functionality checks for violation of the traffic contract by obtains the topology and network resource information 
flows and may penalize certain applications by degrading directly then it updates the Directory Server accordingly, 
their service level temporarily (marking/dropping all such 3. Network Device configuration: In the event that the 
packets). resource management protocols at various network devices 

6. Pacing: During congestion states within the network, 55 are capable of remote configuration then the control server 
certain channels may be affected because they use congested may take on the task of configuring them accordingly. In 
portions of the network. As will be discussed later, the particular, the control server may adjust the parameters of 
Control Server component of the SLA architecture is the link bandwidth scheduling protocols at various routers, 
capable of detecting both the congestion state as well as These adjustments will be propagated automatically, without 
affected flows. Under the directive of the control server, an 60 the need for rebooting the devices thereby reconfigured. 
Edge Device will regulate the rates of affected active chan- 4. Determining routing topology: Periodically, the control 
nels to alleviate the impact of congestion. server obtains routing tables and other routing information 

7. Statistics collection: An Edge Device maintains various relevant, to the routers and edge devices, in order to remain 
counters to monitor the traffic rates of flows in cither aware of the latest routing topology of the backbone net- 
direction. 65 work. This may be done through participation in routing 

8. Traffic prediction: This involves using the collected protocols or polling router^edge-de vices through network 
statistics to forecast near-term traffic (and the consequent management protocols. 
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5. Polling Edge Devices for Channel Statistics; 
Periodically, the control server polls the edge devices, and 
obtains the current set of active channels, the bandwidth 
utilization of these channels, other traffic statistics that are 
relevant to network resource use, as well as any estimates of 
future resource use computed by the edge devices. 

6. Load balancing: If the control server is capable of 
determining the routing in the network, then it uses this 
capacity to balance the load in the network. 

7. Congestion avoidance: The control server may detect 
that certain network devices, such as routers and links, are 
overloaded. In this case the control server will compute the 
channels that are over-using their resource allocations and 
inform the appropriate ingress edge devices corresponding 
to these channels, advising them to pace the channels. 
3. Directory Server 

The directory server is responsible for maintaining infor- 
mation which is relatively static. It is orgaoized as per the 
specifications of The Directory: Overview of Concepts, 
Models and Service, CCITT Recommendation X.500, 1988. 
Information in the directory server is used to maintain 
information about classifying packets into one or more 
service-levels, to maintain policy information regarding 
applications, and to maintain information about the different 
service-levels that is to be expected of different customers. 

The directory server is represented as an X.500 directory 
and is accessible via the LDAP protocol (Lightweight Direc- 
tory Access Protocol RFC 1777, March 1995, by W. Ycong, 
T. Howes and S. Kille). It maintains three main types of 
information: 

Information about the policy rules for applications and 

users, mapping traffic to one of the several service 

levels of the class. 
Information about the mapping of service-levels to the 

appropriate set of performance requirements. 
Information about the topology of the network, and the 

characteristics of the communication channels in the 

network. 

All the entries relevant to Service Level Agreements are 
rooted at the entry identifying the Network Operator 10. A 40 
hierarchical structure of the directory that can be used for 
Service Level Agreements is illustrated in FIG. 2. 

Each entry in the directory belongs to one of the catego- 
ries shown above, and is assigned a relative distinguished 
name at each step of ibe path it belongs to. 45 

The category Network Operator 20 identifies the organi- 
zation that is responsible for this portion of the network. It 
contains all the standard attributes associated with an orga- 
nization in a standard X.500 directory. 
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that will be assigned to a specific interface. Each entry in this 
category has a type attribute which determines whether the 
principle can be identified by looking at the Type of Service 
(TOS) byte in an incoming IP packet, whether it can be 
identified by looking at IP header information only, whether 
it can be identified on the basis of IP and TCP/UDP headers, 
or if the principle is identified by means of an URL or by 
other upper layer information. It identifies the interface to 
which the principle is applicable. It may be applicable to 
more than one interface. A special interface address (IP addr 
0.0.0.0) is used to identify a principle that is applicable to all 
the interfaces belonging to a particular customer. 

The Service Level attribute 24 identifies the service level 
to which the traffic matching the specifications of the prin- 
ciple could be mapped onto. 

The specification of the principle is identified by the type 
of principle. If the principle type is interface only, no further 
specification is needed. All packets coming on the specified 
interface are mapped to the specified service-level. If the 
principle type is IP header only, it may contain the TOS byte 
of the incoming packet which defines the principle. All 
packets on the interface with specified TOS byte value are 
mapped onto the specified service level. If the principle type 
contains IP and TCP/UDP headier, the additional attributes 
would be the source-destination IP addresses and the TCP/ 
UDP port numbers which would need to be specified. 

The Channel category 25 identifies the different logical 
connections that are supported by the network operator for 
a particular customer. The channel contains the source and 
destination interface addresses, as well as the required 
performance and traffic level that should be supported on the 
channel. Other attributes of the channel can also be stored in 
this entry. Both the desired and the actual observed perfor- 
mance of the channel is stored in this entry. 

The Subnet category 26 identifies the topology behind the 
interface. The subnet is characterized by an IP address and 
a subnet mask. Multiple subnets that belong to the customer, 
and are accessible through the interface, are enumerated 
here. 

The Service Level category 24 identifies the different 
service-levels that are offered by the network operator. An 
entry in this category would contain the service-level name, 
and the method used to encode the service in the backbone 
network. The method used may specify the TOS byte to be 
used by packets, or a specific port and protocol within which 
such traffic should be encoded. It also contains the perfor- 
mance specifics required for service-level, including the 
round-trip delay or loss-rate. The performance specifics are 
specified as an incremental add-on to the basic performance 



The Customer category 21 identifies different customers 50 that is determined by the properties of the source and 



supported by the network operator. Each customer has all the 
attributes associated with an organization or an individual. 
Additionally, it contains a list of all the interfaces associated 
on which the customer packets can be received. These 
interfaces are identified by their unique IP addresses within 55 
the operator network. 

The Interface category 22 identifies an interface through 
which a customer may send its traffic. The access to an 
interface might be via dial-up lines or via a directly con- 
nected network. An interface is identified by its IP address, so 
and has a default service level which is assigned to its 
owners. It also contains the name of the owner and the 
physical machine on which it is installed. An interface entry 
also contains the time when it was last updated. 

The Principle category 23 identifies the type of rules that 65 
are used to determine the Service Level 24 to which traffic 
should be assigned. Each principle is a subset of the traffic 



destination address for a specific communication channel. 
The entry also specifies the action to be taken when traffic 
belonging to this service-level is found to be in violation of 
an assigned traffic rate. The action could take the form of 
dropping packets, buffering packets, downgrading packets 
to a different service-level, or downgrading sessions to a 
different service-level. In the latter case, the next service 
level also needs to be specified. 

The Backbone Topology class 27 is used to store the 
topology of the network and the network state. Below its 
hierarchy, one would find the different nodes 28 and links 29 
which will constitute the backbone network, and their state, 
including their present utilization, their capacity etc. In some 
implementations, it may not be necessary to support the full 
directory tree structure shown in FIG. 2. For example, the 
backbone topology need not be stored in cases where the 
control server determines backbone topology dynamically. 
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Similarly, if the network is dedicated to a single customer, 
the customer class 21 can be eliminated. 
4 Edge device — Directory Server Protocol 

The system can operate in one of two modes. In the first 
mode, the directory server and the control server are 
integrated, and are accessed using a directory access proto- 
col such as LDAP. In the second mode, the directory server 
and the control server are different entities (which may be 
co- located on the same machine), each supporting a different 
set of protocols. The directory server would always use a 
protocol such as LDAP, although the control server is likely 
to use a different protocol to counter the difficulties associ- 
ated with managing highly dynamic data using LDAP. 

In both of the modes, the edge-device registers itself with 
the Directory Server when it is initiated. The edge-device 
Would identify the interface address to which it is attached, 
and registers the subnets behind it with the directory server. 
It would also obtain the list of other interfaces belonging to 
the same customer, and the subnets associated with them 
from the directory server. The edge-device would then query 
the directory server for the different principles that are 
defined for it. These principles provide the rules that map the 
traffic on the network from the edge-device to the different 
levels of service in the backbone network. The set of 
channels belonging to the interface are also queried by the 
edge-device. These provide the values for the different traffic 
rates associated with each channel. Any other policy infor- 
mation is also obtained by querying the directory server. 

When the directory server is integrated with the control 
server, the edge-device would periodically query the direc- 
tory for the different channels for which an interface on the 
edge-device is an end-point. For these channels, the edge- 
device would obtain the specified maximum traffic to be 
placed in the network, as well as the specific performance 
expected of the channel. The edge-device only sends out 
packets according to these specifications into the network. 
The edge-devices would also monitor the performance of the 
packets, and update the status of the channel in the directory 
as to whether the performance specs are being met or not. 
5. Edge Device — Control Server Protocol 

When the control server and directory server are 
integrated, the protocol used between the two is as defined 
in the previous section. However, when the control sever is 
a different entity, a different polling protocol is used by it to 
monitor the state of the network. 

When it is initiated, the edge-device is required to register 
with the control server. The control server, at periodic 
intervals, polls each of the edge-devices to obtain the 
performance and traffic characteristics of the channels 
belonging to each edge-device. The control server would use 
this information to determine which edge-devices, if any, 
should be regulated, and allocate the pacing rate to the ones 
being regulated. The control server uses an adaptive 
dynamic algorithm to determine the rates, and the set of 
channels which need to be paced. A set of channels may need 
to be paced together since they all share the same bottleneck 
link. This mapping information is also sent by the control 
server. 

The rates to be assigned to different channels can be 
computed by the following adaptive dynamic algorithm, 
where the network state is collected al regular periods and 
the algorithm is invoked at the end of each period. The 
algorithm seeks to determine the maximum utilization of a 
network node/link which would satisfy the performance 
requirements of all channels: 

1. For all the channels in the network, determine the 
following quantities: a) mean actual traffic observed in the 
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previous period; b) forecasted mean traffic expected for the 
next period; c) whether the channel's performance require- 
ments were satisfied in the last period. 

2. On the basis of the foregoing collected information, 
5 determine the utilization of each node and link in the 

network for the last period. The utilization is given as the 
ratio of actual traffic in the last period to the traffic capacity 
of the node/link. 

3. For all channels whose performance was not met in the 
1Q previous period, identify the node/link with the highest 

utilization long its path. For this path, revise the maximum 
utilization of the node/link to be the lower of the current 
maximum utilization and the actual utilization. 

4. If all channels through a node/link have met their 
performance in the last period, revise the maximum utiliza- 

15 tion of the node /link to be the higher of current maximum 
utilization and the actual utilization. 

5. For each channel for which the maximum utilization of 
each node/link along its path is larger than the predicted 
utilization for the next period, make its assigned rate be the 

20 same as the predicted rate. 

6. For all nodes/links for which the predicted utilization in 
the network exceeds the maximum utilization in the 
network, repeat steps 7 through 8. 

7. Assign the assigned rate to the channels passing 
25 through the node by the ratio of the maximum utilization to 

the actual utilization. 

8. Recompute the predicted utilization of the nodes/links 
in the network. 

6. End Host Protocol 

3Q In the normal mode of operation, the end host would not 
be expected to participate in any interaction with the direc- 
tory server or the control server. However, new applications 
which are aware of the directory server can use a protocol to 
query the directory and obtain their service-level informa- 

35 tion. Some of the applications may be capable of marking 
their service-levels in the end-host, and the edge-device 
function would be to verify that the end-host marking is 
consistent with the schema as defined by the directory. In 
some cases, the end host may update the entries in the 

4Q directory which would be queried by the edge-device to 
obtain the most current rules. 

While the invention has been described in terms of a 
single preferred embodiment, those skilled in the art will 
recognize that the invention can be practiced with modifi- 

45 cation within the spirit and scope of the appended claims. 
Having thus described our invention, what we claim as 
new and desire to secure by Letters Patent is as follows: 

1. A method of controlling packet traffic in an IP network, 
comprising the steps of: 

50 at one or more edge-nodes in a connectionless network, 
identifying inter-node connections and determining 
their corresponding initial traffic classes and traffic 
flows by looking up said initial traffic classes in a 
relatively static directory server, said relatively static 

55 directory server having a configuration that adapts to 
longer term fluctuations of traffic; 
transforming packets belonging to said inter-nodc con- 
nections to encode information about said traffic 
classes; and 

eo for one or more of said traffic flows, regulating the 
transmission rate of packets belonging to each of said 
one or more traffic flows to meet performance objec- 
tives according to service level agreements. 

2. The method of claim 1, wherein said regulation of 
65 transmission rate includes setting an upper limit for said rate. 

3. A method of controlling packet traffic in an IP network, 
comprising the steps of: 
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at one or more edge-nodes in a connectionless network, 
identifying intcr-nodc connections and determining 
their corresponding initial traffic classes and traffic 
flows; 

transforming packets belonging to said inter-node con- 5 
nections to encode information about said traffic 
classes; and 

for one or more of said traffic flows, regulating the 
transmission rate of packets belonging to each of said 
one or more traffic flows to meet performance objec- 10 
tives according to service level agreements, 
wherein said regulation of transmission rate includes 

setting an upper limit for said rate, and 
wherein said regulation of packet transmission rate 
includes determining said rate by the further steps of: 
at edge-nodes, collecting traffic statistics and perfor- 
mance data relating to a traffic flow; 
processing the information from said collecting step to 
determine data flow rates for different priorities of 
traffic; 

repeating said collecting and processing steps at peri- 
odic intervals, and propagating said rate to individual 
nodes on said network in a quasi-static mode. 

4. The method of claim 3, wherein said regulation of rates 

is done by a control server which dynamically tracks traffic 25 
in the network and determines rates for traffic flows at one 
or more network nodes, said network nodes being either 
edge-nodes or intermediate nodes. 

5. The method of claim 4, wherein said control server 
configures devices on the network to efficiently support the 
rates for traffic flows at network nodes, said configuration 
being done when new rates are computed at the control 
server. 
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30 



mining their corresponding initial traffic classes and 
traffic flows, said means for determining said initial 
traffic classes further comprising means for looking up 
said initial traffic classes in a relatively static directory 
server, said relatively static directory server having a 
configuration that adapts to longer term fluctuations of 
traffic; 

means for transforming packets belonging to said inter- 
node connections to encode information about said 
traffic classes; and 

for one or more of said traffic flows, means for regulating 
the transmission rate of packets belonging to each of 
said one or more traffic flows to meet performance 
objectives according to service level agreements. 

10. An apparatus as in claim 9, wherein said means for 
regulating the packet transmission rate includes means for 
setting an upper limit for said rate. 

11. An apparatus as in claim 10, wherein said regulating 
means further comprises: 

means for collecting statistics about traffic flows; and 
means for collecting performance information about traf- 
fic flows. 

12. An apparatus as in claim 10, further comprising means 
for changing the initial traffic class of an inter-node connec- 
tion to a new traffic class. 

13. An apparatus as in claim 11, wherein said regulating 
means further comprises means for determining if the ser- 
vice level agreements of a traffic flow are being satisfied. 

14. An apparatus as in claim U, wherein said rates for said 



6. The method of claim 3, wherein said regulation of 35 traffic flows are regulated according to said traffic statistics 



40 



packet transmission rate includes determining said rate by 
storing and querying entries in a directory server. 

7. A method of controlling packet traffic in an IP network, 
comprising the steps of: 

at one or more edge-nodes in a connectionless network, 
identifying inter-node connections and determining 
their corresponding initial traffic classes and traffic 
flows; 

transforming packets belonging to said inter-node con- 45 
nections to encode information about said traffic 
classes; and 

for one or more of said traffic flows, regulating the 
transmission rate of packets belonging to each of said 
one or more traffic flows to meet performance objec- 50 
fives according to service level agreements, 

wherein said regulation of transmission rate includes 
setting an upper limit for said rate, and 

wherein said initial traffic classes and said encoding are 
changed by edge-nodes to form new traffic classes and 55 
new encoding for said inter-node connections. 

8. The method of claim 7, wherein for each of one or more 
of said intcr-nodc connections said new traffic class is of a 
lower priority than said initial traffic class, and said change 
to said new traffic class is made when said inter-node 
connection is sending packets in the network at a rate 
beyond said upper limit. 

9. An apparatus for controlling packet traffic in an IP 
network, comprising: 

at one or more edge-nodes in a connectionless network, 
means for identifying inter-node connections and dcter- 



60 



65 



and said performance information. 

15. The method of claim 9, wherein said identifying and 
transforming means are implemented at edge devices. 

16. An apparatus for computing the rates for traffic flows 
at individual nodes in an IP network, comprising: 

means for determining the routing topology of a connec- 
tionless network; 

means for collecting statistics about traffic flows in the 
network; 

means for collecting performance information about traf- 
fic flows in the network; and 

means for combining said routing topology with said 
statistics and said performance information to deter- 
mine the rates for traffic flows in the network to meet 
service level agreements, 

wherein said combining means is accomplished at peri- 
odic intervals and said rates arc disseminated to net- 
work nodes in a quasi -static mode. 

17. An apparatus as in claim 16, further comprising means 
for configuring network devices to optimally support said 
traffic rates. 

18. A method for optimizing resource utilization among 
customers of an IP network, comprising the steps of: 

defining service level agreements for each said customer; 

establishing a control server as a dynamic repository of 
network information, said information including 
resource utilization, topology, and service level agree- 
ments; 
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receiving said topology information at said control server, 
said topology information including edge devices 
through which said customers connect to the network; 

establishing a directory server as a quasi-static repository 
of network information, said information including 
policy rules for mapping traffic to service levels, and 
for mapping service levels to performance require- 
ments; 

. monitoring traffic on said network at each of a plurality of 
edge devices, said edge devices operating to classify 
said traffic; 



14 



using said control server to compute the allocation of 
backbone network resources and issue pacing instruc- 
tions to said edge devices; and 

propagating directory server information to network 
devices automatically and without reconfirming the 
network, said propagation being accomplished dynami- 
cally over long time scales, 

wherein said network is connectionless. 
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ABSTRACT 



A system within a computer network identifies specific 
traffic flows originating from a given network entity and 
requests and applies appropriate policy rules or service 
treatments to the traffic flows. A network entity includes a 
flow declaration component that communicates with one or 
more application programs executing on the entity. The flow 
declaration component includes a message generator and an 
associated memory for storing one or more traffic flow data 
structures. For a given traffic flow, the application program 
issues one or more calls to the flow declaration component 
providing it with information identifying the traffic flows. 
The flow declaration component then opens a flow manage- 
ment session with a local policy enforcer that obtains policy 
rules or service treatments for the identified flow from a 
policy server and applies those rules or treatments to the 
specific traffic flows from the network entity. 

16 Claims, 10 Drawing Sheets 
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METHOD AND APPARATUS FOR 
IDENTIFYING NETWORK DATA TRAFFIC 
FLOWS AND FOR APPLYING QUALITY OF 
SERVICE TREATMENTS TO THE FLOWS 

CROSS-REFERENCE TO RELATED 
APPLICATIONS 

This application is a continuation of application Ser. No. 
09/911,122, filed Jul. 23, 2001, now U.S. Pat. No. 6,434,624, 
which is a continuation of application Scr. No. 09/206,067, 
filed Dec. 4, 1998, now U.S. Pat. No. 6,286,052, which is 
hereby incorporated by reference in its entirety. 

This application is related to the following U.S. patent 
application: 

U.S. patent application Ser. No. 09/179,036 entitled, 
METHOD AND APPARATUS FOR DEFINING AND 
IMPLEMENTING HIGH-LEVEL QUALITY OF SER- 
VICE POUCIES IN COMPUTER NETWORKS, filed Oct. 
26,1998, now U.S. Pat. No. 6,167,445 and assigned to the 
assignee of the present application. 

FIELD OF THE INVENTION 

The present invention relates generally to computer 
networks, and more specifically, to a method and apparatus 
for identifying network data traffic flows and for to applying 
quality of service or policy treatments thereto. 

BACKGROUND OF THE INVENTION 

A computer network typically comprises a plurality of 
interconnected entities that transmit (i.e., "source") or 
receive (i.e., "sink") data frames. A common type of com- 
puter network is a local area network ("LAN") which 
typically refers to a privately owned network within a single 
building or campus. LANs employ a data communication 
protocol (LAN standard), such as Ethernet, FDDI or Token 
Ring, that defines the functions performed by the data link 
and physical layers of a communications architecture (i.e. f a 
protocol stack), such as the Open Systems Interconnection 
(OSI) Reference Model. In many instances, multiple LANs 
may be interconnected by point-to-point links, microwave 
transceivers, satellite hook-ups, etc. to form a wide area 
network ("WAN"), metropolitan area network ("MAN") or 
intranet. These LANs and/or WANs, moreover, may be 
coupled through one or more gateways to the Internet. 

Each network entity preferably includes network commu- 
nication software, which may operate in accordance with the 
well-known Transport Control Protocol/Internet Protocol 
(TCP/IP). TCP/IP basically consists of a set of rules defining 
how entities interact with each other In particular, TCP/IP 
defines a series of communication layers, including a trans- 
port layer and a network layer. At the transport layer, TCP/IP 
includes both the User Data Protocol (UDP), which is a 
connectionless transport protocol, and TCP which is a 
reliable, connection-oriented transport protocol. When a 
process at one network entity wishes to communicate with 
another entity, it formulates one or more messages and 
passes them to the upper layer of the TCP/IP communication 
stack. These messages are passed down through each layer 
of the stack where they are encapsulated into packets and 
frames. Each layer also adds information in the form of a 
header to the messages. The frames are then transmitted over 
the network links as bits. At the destination entity, the bits 
are re-assembled and passed up the layers of the destination 
entity's communication stack. At each layer, the correspond- 
ing message headers are also stripped off, thereby recovering 
the original message which is handed to the receiving 
process. 
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One or more intermediate network devices are often used 
to couple LANs together and allow the corresponding enti- 
ties to exchange information. For example, a bridge may be 
used to provide a "bridging" function between two or more 

5 LANs. Alternatively, a switch may be utilized to provide a 
"switching" function for transferring information, such as 
data frames or packets, among entities of a computer net- 
work. Typically, the switch is a computer having a plurality 
of ports that couple the switch to several LANs and to other 
switches. The switching function includes receiving data 
frames at a source port and transferring them to at least one 
destination port for receipt by another entity. Switches may 
operate at various levels of the communication stack. For 
example, a switch may operate at layer 2 which, in the OSI 
Reference Model, is called the data link layer and includes 

15 the Logical Link Control (LLC) and Media Access Control 
(MAC) sub-layers. 

Other intermediate devices, commonly referred to as 
routers, may operate at higher communication layers, such 
as layer 3, which in TCP/IP networks corresponds to the 

20 Internet Protocol (IP) layer. IP data packets include a cor- 
responding header which contains an IP source address and 
an IP destination address. Routers or layer 3 switches may 
re-assemble or convert received data frames from one LAN 
standard (e.g., Ethernet) to another (e.g. Token Ring). Thus, 

25 layer 3 devices are often used to interconnect dissimilar 
subnetworks. Some layer 3 intermediate network devices 
may also examine the transport layer headers of received 
messages to identify the corresponding TCP or UDP port 
numbers being utilized by the corresponding network enti- 

30 ties. Many applications are assigned specific, fixed TCP 
and/or UDP port numbers in accordance with Request for 
Comments (RFC) 1700. For example, TCP/UDP port num- 
ber 80 corresponds to the hyper text transport protocol 
(HTTP), while port number 21 corresponds to file transfer 

35 protocol (ftp) service. 

Allocation of Network Resources 

Computer networks include numerous services and 
resources for use in moving traffic throughout the network. 

40 For example, different network links, such as Fast Ethernet, 
Asynchronous Transfer Mode (ATM) channels, network 
tunnels, satellite links, etc., offer unique speed and band- 
width capabilities. Particular intermediate devices also 
include specific resources or services, such as number of 

45 priority queues, filter settings, availability of different queue 
selection strategies, congestion control algorithms, etc. 

Individual frames or packets, moreover, can be marked so 
that intermediate devices may treat them in a predetermined 
manner. For example, the Institute of Electrical and Elec- 

50 ironies Engineers (IEEE), in an appendix (802.1p) to the 
802.1 D bridge standard, describes additional information for 
the MAC header of Data Link Layer frames. FIG. 1A is a 
partial block diagram of a Data Link frame 100 which 
includes a MAC destination address (DA) field 102, a MAC 

ss source address (SA) field 104 and a data field 106. In 
accordance with the 802.1Q standard, a user_priority field 
108, among others, is inserted after the MAC SA field 104. 
The user_priority field 108 may be loaded with a predeter- 
mined value (e.g., 0-7) that is associated with a particular 

60 treatment, such as background, best effort, excellent effort, 
etc. Network devices, upon examining the user_priority 
field 108 of received Data Link frames 100, apply the 
corresponding treatment to the frames. For example, an 
intermediate device may have a plurality of transmission 

65 priority queues per port, and may assign frames to different 
queues of a destination port on the basis of the frame's user 
priority value. 
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r^OFlG. IB is a partial block diagram of a Network Layer 
U packet 120 corresponding to the Internet Protocol. Packet 
7 120 includes a type_oL_service (ToS) field 122, a protocol 
field 124, an IP source address (SA) field 126, an IP 
destination address (DA) field 128 and a data field 130. The 5 
ToS field 122 is used to specify a particular service to be 
applied to the packet 120, such as high reliability, fast 
delivery, accurate delivery, etc., and comprises a number of 
sub-fields (not shown). The sub-fields include a three bit IP 
precedence (IPP) field and three one bit flags (Delay, 10 
Throughput and Reliability). By setting the various flags, an 
entity may indicate which overall service it cares most about 
(e.g., Throughput versus Reliability). Version 6 of the Inter- 
net Protocol (IPv6) similarly defines a traffic class field, 
which is also intended to be used for defining the type of 15 
service to be applied to the corresponding packet. 

Recently, a working group of the Internet Engineering 
Task Force (IETF), which is an independent standards 
organization, has proposed replacing the ToS field 112 of 
Network Layer packets 120 with a one octet differentiated 20 
services (DS) field 132 that can be loaded with a differen- 
tiated services codepoint. Layer 3 devices that are DS 
compliant apply a particular per-hop forwarding behavior to 
data packets based on the contents of their DS fields 132. 
Examples of per-hop forwarding behaviors include expe- 25 
dited forwarding and assured forwarding. The DS field 132 
is typically loaded by DS compliant intermediate devices 
located at the border of a DS domain, which is a set of DS 
compliant intermediate devices under common network 
administration. Thereafter, interior DS compliant devices 30 
along the path simply apply the corresponding forwarding 
behavior to the packet 120. 

FIG. 1C is a partial block diagram of a Transport Layer 
packet 150. The network layer packet 150 preferably 
includes a source port field 152, a destination port field 154 
and a data field 156, among others. Fields 152 and 154 are 
preferably loaded with the predefined or dynamically 
agreed-upon TCP or UDPport numbers being utilized by the 
corresponding network entities. 




35 



40 



Service Level Agreements 



To interconnect dispersed computer networks, many orga- 
nizations rely on the infrastructure and facilities of internet 
service providers (ISPs). For example, an organization may 45 
lease a number of Tl lines to interconnect various LANs. 
These organizations and ISPs typically enter into service 
level agreements, which include one or more traffic speci - 
fiers. These traffic specifiers may place limits on the amount 
of resources that the subscribing organization will consume 50 
for a given charge. For example, a user may agree not to 
send traffic that exceeds a certain bandwidth (e.g., 1 Mb/s). 
Traffic entering the service provider's network is monitored 
(i.e., "policed") to ensure that it complies with the relevant 
traffic specifiers and is thus "in-profile". Traffic that exceeds 5S 
a traffic specifier (i.e., traffic that is "out-of-profile") mayjbe 
dropped o r shaped or may cause an accounting change (i.e., 
causing the user to be charged a higher rate). Another option 
is to mark the traffic as exceeding the traffic specifier, but 
nonetheless allow it to proceed through the network. If there m 
is congestion, an intermediate network device may drop 
such "marked" traffic first in an effort to relieve the conges- 
Lion. 



Multiple Traffic Flows 

A process executing at a given network entity, moreover, 
may generate hundreds if not thousands of traffic flows that 



65 



are transmitted across the corresponding network every day. 
A traffic flow generally refers to a set of messages (frames 
and/or packets') that typically correspond to a particular task, 
transaction or operation (e.g., a print transaction) and may be 
identified by 5 network and transport layer parameters (e.g., 
source and destination IP addresses, source and destination 
TCP/UDP port numbers and transport protocol). 
Furthermore, the treatment that should be applied to these 
different traffic flows varies depending on the particular 
traffic flow at issue. For example, an online trading appli- 
cation may generate stock quote messages, stock transaction 
messages, transaction status messages, corporate financial 
information messages, print messages, data back-up 
messages, etc. Anctwork administrator, moreover, may wish 
to have very different policies or service treatments applied 
to these various traffic flows. In particular, the network 
administrator may want a stock quote message to be given 
higher priority than a print transaction. Similarly, a $1 
million stock transaction message for a premium client 
should be assigned higher priority than a $100 stock trans- 
action message for a standard customer. Most intermediate 
network devices, however, lack the ability to distinguish 
among multiple traffic flows, especially those originating 
from the same host or server. 

SUMMARY OF THE INVENTION 

It is an object of the present invention to provide a method 
and apparatus for identifying one or more traffic flows from 
a source entity. 

It is a further object of the present invention to provide a 
method and apparatus for obtaining traffic policies to be 
applied to identified traffic flows. 

It is a further object of the present invention to manage 
traffic flows in accordance with corresponding policies. 

Briefly, the invention relates to a method and apparatus 
for identifying specific traffic flows originating from a 
network entity and for applying predetermined policy or 
service treatments to those flows. In particular, a network 
entity includes a flow declaration component that is coupled 
to one or more application programs executing on the entity. 
The network entity also includes a communication facility 
that supports to message exchange between the application 
program and other network entities. The flow declaration 
component includes a message generator and an associated 
memory for storing one or more traffic flow data structures. 
For a given traffic flow, the application program calls the 
flow declaration component and provides it with one or 
more identifying parameters corresponding to the given 
flow. In particular, the application program may provide 
network and transport layer parameters, such as IP source 
and destination addresses, TCP/UDP port numbers and 
transport protocol associated with the given traffic flow. It 
also provides one or more application-level parameters, such 
as a transaction-type (e.g., a stock transaction), a sub- 
transaction-type (e.g., a $1 Million stock purchase order), 
etc. The flow declaration component provides this informa- 
tion to a local policy enforcer , which, in turn, may query a 
policy server to obtain one or more policy or servic e 
treatments that are to be applied to the identified traffic flow. 
The local policy enforcer then monitors the traffic origina t- 
ing from the network entity and, by examining IP source and 
de stination addresses, among other information, applies th e 
prescribed policy or service treatments to the given traffic 

"now. ■■ 

In the preferred embodiment, the application program and 
the flow declaration component at the network entity interact 
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through an Application Programming Interface (API) layer, area networks (LANS) 202, 204 and 206 that are intcrcon- 
which includes a plurality of system calls. In addition, the nected by a plurality of intermediate network devices 208, 
Mow declaration component generates and transmits one or 210. Coupled to the LANs arc a plurality of entities, such as 
more application parameter declaration (AFD) messages to end station 212 and print server 214. The network further 
the local policy enforcer. The APD messages contain the 5 includes at least one policy server 216 that may be coupled 
network and transport layer parameters (e.g., IP source and to a repository 218 and to a network administrator's station 
destination addresses, TCP/UDP port numbers and transport 220. A server suitable for use as policy server 216 is any 
protocol) stored at the traffic flow data structure for the given Intel x86/Windows NT® or Unix-based platform. The net- 
flow. The messages may also contain the application-level work 200 also includes at least one host or server 222 
parameters specified by the application program. The JQ configured in accordance with the present invention, 
information, moreover, may be in the form of objects In particular, the host/server 222 includes at least one 
generated by the flow declaration component. Preferably, the application program or process 224, a flow declaration 
flow declaration component and the local policy enforcer component 226 and a communication facility 228. The is 
exchange messages in accordance with a novel protocol that flow declaration component 226 includes a message gen- 
defines a message scheme in addition to a message format. erator 230 that is in communicating relation with the com- 
"The local policy enforcer and the policy server may utilize 15 munication facility 228. Component 226 is also coupled to 
the Common Open Policy Service (COPS) protocol to an associated memory 232 for storing one or more traffic 
request and receive particular policies or service treatment flow data structures 234. The application program 224 is in 
rules . Preferably, the policy server maintains or otherwise communicating relation with both the communication facil- 
h as access to a store of network policies established by the ity 228 and, through an Application Programming Interface 
network a dministrato r. ' ~* 20 (API) layer 236, to the flow declaration component 226. The 
In another aspect of the invention, the local policy communication facility 228, in turn, is connected to network 
enforcer may establish a traffic flow state that includes the 200 via LAN 206. The host/server 222 also comprises 
policyorservicetreatmentsspecifiedbythepolicyserver.lt conventional programmable processing elements (not 
then monitors the traffic flows originating from the network shown), which may contain software program instructions 
entity looking for the given traffic flow. Once the given 25 pertaining to the methods of the present invention. Other 
traffic flow is identified, the local policy enforcer applies the computer readable media may also be used to store the 
policy or service treatments set forth in the corresponding program instructions. 

traffic flow state. For example, the policy enforcer may mark The communication facility 228 preferably includes one 

the packets or frames with a high priority DS codepoint. or more software libraries for implementing a communica- 

When the given traffic flow is complete, the application 30 tion protocol stack allowing host/server 222 to exchange 

program may notify the flow declaration component, which, messages with other network entities, such as end station 

in turn, signals the end of the traffic flow to the local policy 212, print server 214, etc. In particular, the communication 

enforcer. The policy enforcer may request authorization facility 228 may include software layers corresponding to 

from the policy server to release or otherwise discard the the Transmission Control Protocol/Internet Protocol (TCP/ 

respective traffic flow state. 3S IP), the Internet Packet Exchange (IPX) protocol, the Apple- 

In an alternative embodiment of the invention, policy Talk protocol, the DECNet protocol and/or NetBIOS 

rules may be cached at the local policy enforcer to eliminate Extended User Interface (NetBEUI). Communication facil- 

the need to query the policy server for each new traffic flow. ity 228 further includes transmitting and receiving circuitry 

In another embodiment of the invention, the APD mes- and components, including one or more network interface 

sages are replaced with one to or more enhanced Path or 40 cards (NICs) that establish one or more physical ports to 

Reservation messages as originally specified in the Resource LAN 206 or omer LANs for exchanging data packets and 

ReSerVation Protocol (RSVP). frames. 

BRIEF DESCRIPTION OF THE DRAWINGS Intermediate network devices 208, 210 provide basic 

hric fe j pp functions mcluding filtering of data traffic by (L 

The above and further advantages of the invention may be 45 medium access control (MAC) address, "learning" o f - 

better understood by referring to the following description in MAC address based upon a source MAC address of a frame 

conjunction with the accompanying drawings, in which: and forwarding of the frame based upon a destination MAC 

FIGS. 1A-1C, previously discussed, are partial block address or route information field (RIF). They may also 

diagram of network messages; include an Internet Protocol (IP) software layer and provide 

FIG. 2 is a highly schematic block diagram of a computer 50 route processing, path determination and path switching 

network; functions. In the illustrated embodiment, the intermediate 

FIG. 3 is a highly schematic, partial block diagram of network devices 208, 2 10 are computers having transmitting 

local policy enforcer; and receiving circuitry and components, including network 

FIGS. 4A-4D are flow diagrams illustrating the message interface cards (NICs) establishing physical ports, for 

scheme and tasks performed in identifying a traffic flow and 55 exchanging data frames. Intermediate network device 210, 

obtaining the corresponding policies; moreover, is preferably configured as a local policy enforcer 

FIGS. 5A-5B are highly schematic block diagrams illus- for traffic flows originating ^om host/server 222, as 

trating the preferred formal of an application parameter described below. 

declaration message; and 11 should be understood that the network configuration 

FIG. 6 is a highly schematic block diagram illustrating an 60 200 of FIG - 2 k for illustrative purposes only and that the 

enhanced Resource ReSerVation Protocol (RSVP) message V*™** 1 invention will operate with other, possibly far more 

in accordance with the invention. complex, network topologies. For example, the repository 

218 and network administrator's station 220 may be directly 

DETAILED DESCRIPTION OF THE or indirectly connected to the policy server 216 (e.g., 

PREFERRED EMBODIMENT fi5 tnrougb one or raore i nterme diate devices). 

FIG. 2 is a highly schematic block diagram of a computer FIG. 3 is a partial block diagram of local policy enforcer | 

network 200. The network 200 includes a plurality of local 210. Local policy enforcer 210 includes a traffic flow slate I 
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machine engine 310 for maintaining flow states correspond- 226. Program 226 preferably loads the StartUp( ) call 410 

ing to host/server 222 traffic flows, as described below. The with an application identifier thai uniquely identifies appli- 

traffic flow state machine engine 310 is coupled to a com- cation program 224 to component 226 as an argument. The 

munication engine 312. The communication engine 312 is application identifier may be a globally unique identifier 

configured to formulate and exchange messages with the 5 (GUID), which is a 128 bit long value typically provided by 

policy server 216 and the flow declaration component 226 at the application developer, although other identifies may 

host/server 222. That is, communication engine 312 includes ^° be ( e S;> application name). The StartUp( ) call 410 

or has access to conventional circuitry for transmitting and m ^ bc . returned by the flow declarauon component 226 with 

receiving messages over the network 200. The traffic flow a version number as an argument. The version number 

. -an- , ij. % t tx « corresponds to the version of software being executed by the 

state machine engine 310 is also coupled to several U-affic 10 flQw d P edaralion mmpoaaA 226 . other arguments, such as 

management resources and mechanisms. In particular U-affic ^ q^^^ (QoS) and/or traffic management 

flow state machine engine 310 is coupled to a packet/frame res01 ? rces mat are available to traffic flows or ig matin g fr0 m 

classifier 314, a traffic conditioner entity 316 a queue am 224 may afco bc returncd by flow dcc i aratioQ 

selector/mapping entity 318 and a scheduler 320. The traffic component 226 

conditioner entity 316 includes several sub-components, 15 For ^ assume end 2U cofltacts m 

including one or more metering entities 322, one or more 224 and ^ a ^ {c fof a articular ^ ( 

marker entities 324, and one or more shaper/dropper entities IBM stQck) Program 224 rctricvcs thc rcqucsled 

326. The queue selector/mapping entity 318 and scheduler information and prepares a message containing the 

320 operate on the various queues established by local requested stock quote for transmission to end station 212. 

policy enforcer 210 for its ports and/or interfaces, such as 20 Before program 224 commences the traffic flow correspond- 

queues 330a-330e corresponding to an interface 332. i n g to requested slock quote, it preferably issues a 

The term intermediate network device is intended broadly NewBindings( ) call 412 to the API layer 236 of the flow 

to cover any intermediate device for interconnecting end declaration component 226. Thc NewBindings( ) call 412 is 

stations of a computer network, including, without used to inform flow declaration component 226 of an 

limitation, layer 3 devices or routers, as defined by Request 25 anticipated traffic flow to which some policy or service 

for Comments (RFC) 1812 from the Internet Engineering treatments should be applied. In response to the 

Task Force (IETF), intermediate devices that are only par- NewBindings( ) call 412, flow declaration component 226 

tially compliant with RFC 1812, intermediate devices that generates a bindings handle, e.g., HI, and creates a traffic 

provide additional functionality, such as Virtual Local Area flow data structure 234 within associated memory 232. 

Network (VLAN) support, IEEE 802.1Q support and/or 30 Component 226 also maps or associates the traffic flow data 

IEEE 802.1 D support, etc. Intermediate network device also structure 234 with the returned bindings handle HI. Flow 

includes layer 2 intermediate devices, such as switches and declaration component 226 also returns the NewBindings( ) 

bridges, including, without limitation, devices that are fully call 412 to program 224 with the handle HI as an argument, 

or partially compliant with the IEEE 802.1 D standard and Next, traffic flow data structure 234 is loaded with infor- 

intermediale devices that provide additional functionality, 35 mation identifying the anticipated traffic flow. More 

such as VLAN support, IEEE 802.1 Q support and/or IEEE specifically, program 224 next issues one or more network 

802. lp support, Asynchronous Transfer Mode (ATM) and transport layer parameter "Set" API calk 414. These Set 

switches, Frame Relay switches, etc. calls 414 arc used by the flow declaration component 226 to 

FIGS. 4A-4D are flow diagrams illustrating a preferred load traffic flow data structure 234 with network and trans- 
message scheme, relative to time t, in accordance with the port layer parameters, such as Internet Protocol (IP) 
present invention. In general, application program 224 iden- 40 addresses and TCP/UDP port numbers. For example, pro- 
tines one or more anticipated traffic flows to the flow gram 224 may issue a SetSourcePort( ) call 414a using the 
declaration component 226, which, in turn, notifies the local returned handle, HI, and the transport layer port number 
policy enforcer 210. The local policy enforcer 210 requests (e.g., TCP port number 1098) to bc utilized by program 226 
and receives from the policy server 216 corresponding as its arguments. In response, flow declaration component 
policy or service treatments for the anticipated traffic flows. 45 226 loads the identified source port number (i.e., 1098) into 
Local policy enforcer 210 then monitors the traffic original- the traffic flow data structure 234 corresponding to handle 
ing from host/server 222 to identify those frames and/or HI. Flow declaration component 226 may return an 
packets corresponding to the identified flows. When.sujch a acknowledgment to program 224 as an argument to the 
flow is detected, local policy enforcer 210 applies th e SetSourccPorl( ) call 414a. If a problem arises, flow decla- 
jspecified po licy or service treatments to corresponding data 50 ration component 226 may return an error message (e.g., 
trames ancfta r p ackete . ■ ■ insufficient memory, unknown handle, out of bound port 

,. (T ffi C1 number, etc.) as the argument. 

Identification of Traffic Flows , . ./ , ., 

In a similar manner, program 224 preferably causes the 

Assume that application program 224 is a stock transac- flow declaration component 226 to load the corresponding 

lion program that can provide stock quotes to and process 55 traffic flow data structure 234 with its IP address, the 

stock transactions from remote clients, such as end station transport layer protocol (e.g., TCP) and the destination port 

212. The application program 224 preferably communicates number and IP address of thc receiving process at end station 

with end station 212 across network 200 through the com- 212. More specifically, in addition to the SetSourccPort( ) 

munication facility 228 at host/server 222 in a conventional ca \\ 414^ program 224 may issue one or more of the 

manner. Program 224 also communicates with the flow following API system calls: 

declaration component 226 preferably through a plurality of 60 SetSourcelPf ) 414i- 

application programming interface (API) system calls to API SetTranwortPrntocolV ^ 41 Ac • 

layer 236. These API calls are generally issued by the ietJransportFrotocolO 414c, 

program 224 along with one or more arguments and may be SetDestinationPort( ) 414rf; and 

returned by the flow declaration component 226. SetDestinationIP( ) 414e. 

In particular, upon initialization at host/server 222, the 65 Again, program 224 uses the previously returned handle, 

application program 224 preferably issues a StartUpf ) API HI, and the corresponding information (e.g., IP address, 

call 410 to the API layer 236 at flow declaration component transport protocol or port number) as arguments to these API 
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calls. As each Set API call 414 is received, the flow included as an argument in the Set( ) call 416d to the flow 

declaration component 226 loads the identified parameter declaration component 226, which, in response, simply 

into the traffic flow data structure 234. Flow declaration copies the XDR policy element into traffic flow data struc- 

component 226 may similarly return the Set API call 414 ture 234. The policy element may alternatively be specified 

withanerrorcodeoranacknowledgmentasanargument.lt s using the well-known Abstract Syntax Notation One 

should be understood that additional "Set" API calls 414 (ASN.l) format, or any other similar translation or encoding 

may be defined depending on the format of the included techniques. 

information. For example, by utilizing a The application-level parameters may encompass a whole 

SetSourceIPByLong( ) call (not shown), program 224 may range of information relating to different aspects of the 

specify its IP address as a 32 bit binary sequence. 10 traffic flow from the application program 224. For example, 

Alternatively, by utilizing a SetSourccIPByString() call (not application-level parameters include such information as 

shown), program 224 may specify its IP address in dotted user name (e.g., John Smith), user department (e.g., 

decimal format (e.g., 128.120.52.123) or as a host name engineering, accounting, marketing, etc.), application name 

(e.g., name.dcpartmcnt.company.domain). In addition, a (e.g., SAP R/3, PeopleSoft, etc.), application module (e.g., 

single SetNetworkTransportParameters( ) system call may 15 SAP R/3 accounting form, SAP R/3 order entry form, etc.), 

be defined to set all of the network and transport layer transaction type (e.g., print), sub-transaction type (e.g., print 

parameters at once. on HP Laser Jet Printer), transaction name (e.g., print 

It should be understood that application program 224 may monthly sales report), sub-transaction name (e.g., print 
obtain IP source and destination addresses, port numbers and monthly sales report on A4 paper), application state (e.g., 
transport protocol for use in communicating with end station 20 normal mode, critical mode, primary mode, back-up mode, 
212 from the communication facility 228 in a conventional etc.). For a video streaming application, the application- 
manner. It should be further understood that application level parameters might include user name, film name, film 
program 224 may utilize one or more wildcards when compression method, film priority, optimal bandwidth, etc. 
specifying the network and transport layer parameters. Similarly, for a voice over IP application, the application- 

In addition to the network and transport layer parameters 25 level parameters may include calling party, called party, 

(e.g., source and destination IP addresses, transport protocol compression method, service level of calling party (e.g., 

and source and destination TCP/UDP port numbers) which gold, silver, bronze), etc. In addition, for World Wide Web 

correspond to a particular flow of traffic, program 236 may (WWW) server-type applications, the application-level 

specify other identifying characteristics and/or policy ele- parameters may include Uniform Resource Locator (URL) 

ments of the anticipated traffic flow. That is, program 224 30 (e.g., http://www.altavista.com/cgi-in/query?pg-aq&kl- 

may issue one or more application-level "Set" API calls 416 en&r-&search-Search&q-Speech+ne ar+recognition), 

to the flow declaration component 226. For example, a front-end URL (e.g., http:/Hwww.altavista.com), back-end 

Setinteger( ) call 416a may be used to specify some numeri- URL (e.g., query?pg-aq&kl«-en&r-&search-Searcb&q- 

cal aspect (e.g., the size of a file being transferred) of the Speech+near+recognition), mime type (e.g., text file, image 

anticipated traffic flow. The arguments of the Setlnteger( ) 35 file, language, etc.), file size, etc. Those skilled in the art will 

call 416a include the handle HI, the numeric policy element recognize that many other application-level parameters may 

(e.g., 786 Kbytes) and a policy element identifier (PID) that be defined. 

maps the numeric policy element to a particular type or class Application program 224 can also retrieve information 
of information (e.g., file size). When the traffic type data stored at the traffic flow is data structure 234 by issuing one 
structure 234 is subsequently transferred to and processed by 40 or more Get API system calls 418 (FIG. 4B). For example, 
other entities, as described below, the PID will identify its program 224 may issue a GetSourcePort( ) call 418a using 
corresponding information. In response to the Setlntegerf ) the returned bindings handle HI as an argument. In 
call 416a, flow declaration component 226 loads the traffic response, flow declaration component 226 parses the traffic 
flow data structure 234 with the numeric policy element and flow data structure 234 and retrieves the source port infor- 
the PID. Flow declaration component 226 may return the 45 mation stored therein. Component 226 then returns the 
Setlnteger( ) call 416a to program 224 with an acknowledg- GetSourcePort( ) call 418a to program 224 with the source 
ment or error message as arguments. port as an argument. Program 224 may issue similar Get API 
Other application-level Set calls may also be defined. For calls to retrieve other network and transport layer parameters 
example, a SetFloat( ) call 416£» is used to associate a stored at the traffic flow data structure 234. 
numeric value represented in floating decimal format with 50 It should be understood that additional "Get" API system 
the anticipated traffic flow. ASetString( ) call 416c may be calls may be defined for retrieving application-level infor- 
used to associate an alpha-numeric string with the antici- mation from the traffic flow data structure 234. 
pated flow. For example, if the anticipated traffic flow is to After issuing the application-level Set API calls 416, if 
contain a video segment, program 224 may identify the any, the corresponding traffic flow data structure 234 is 
name of the particular video segment and/or the viewer by 55 complete. That is, data structure 234 has been loaded with 
utilizing the SetSlring( ) call 416c. Program 224 uses the each of the identifying characteristics specified by the appti- 
handle HI and the particular alpha-numeric string as argu- cation program 224 for the anticipated traffic flow, 
ments for the SetString( ) call 416c. A PID that maps an In accordance with the invention, the flow declaration 
alpha-numeric string to name of a video segment is also component 226 also opens a communication session with 
included. This information is similarly loaded into the 60 the local policy enforcer 210 and exchanges one or more 
corresponding traffic flow data structure 234 by the flow Application Parameters Declaration (APD) messages. In the 
declaration component 226. A generic Set( ) call 416J may preferred embodiment, the flow declaration component 226 
be used for specifying traffic flow characteristics that do not opens a reliable, connection-based "socket" session using 
correspond to integer, floating decimal point or alpha- the well-know Transport Control Protocol (TCP) protocol of 
numeric string formats. For example, program 224 may 65 the TCP/IP communication protocol stack. A "socket" is 
specify a policy element in the well-known eXtemal Data essentially an interface between the application and trans- 
Representation (XDR) format. This XDR policy clement is port layers of a communication protocol stack that enables 
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the transport layer to identify which process it must com- contains the version of the software being implemented at 

municate with in the application layer. A socket interfaces to the flow declaration component 226. Flags field 518 pref- 

a TCP/IP communication protocol stack via APIs consisting erably contains at least one flag that may be asserted or 

of a set of entry points into the stack. Applications that de-asserted by the flow declaration component 226, as 

require TCP/IP connectivity thus use the socket APIs to 5 described below. The operation code field 520 indicates the 

interface into the TCP/IP stack. For a connection-oriented type of APD message. For a Client Open message 420, for 

protocol (such a TCP), the socket may be considered a example, field 520 is preferably loaded with the value "7" . 

"session". The message length field 524 specifies the length (in octets) 

It should be understood that other protocols, including but of the Client Open message 420. 

not limited to connectionless protocols such as UDP, may be 10 The timer area 512 includes a length field 526 which 

used to establish communication between the flow declara- specifies the length (preferably in octets) of the timer area 

lion component 226 and the local policy enforcer 210. 512, a Class Number (C-Num) field 528, a Class Type 

Additionally, component 226 may communicate with local (C-Typc) field 530 and a Keep Alive Timer Value field 532. 

policy enforcer 210 at the network layer by addressing IP Timer area 512 may also include one or more unused fields, 

format APD messages to end station 212 (i.e., using the is 534, 536. The Class Number field 528 is loaded with an 

same destination address as the anticipated traffic flow) with agreed-upon value (e.g., "11" ) indicating that this portion of 

the well-known Router Alert IP option asserted. Here, local the Client Open message 420 (i.e., timer area 512) contains 

policy enforcer 210 will intercept such asserted network a keep alive timer value. Where multiple types may exist for 

layer packets and may act on them itself and/or forward a given class number, the Class Type field 530 is used to 

them to some other network device. 20 specify the particular type, Here, field 530 is preferably set 

Component 226 may be preconfigured with the IP address to "1" . Flow declaration component 226 preferably loads 

of the local policy enforcer 210 or it may dynamically obtain the Keep Alive Timer Value field 532 with a proposed time 

the address of a local policy enforcer. For example, com- value (e.g., 30 seconds) to be used for maintaining the TCP 

ponent 226 or application program 224 may broadcast an session in the absence of substantive APD messages, as 

advertisement seeking the IP address of an intermediate 25 described below. £\ 

network device that is capable of obtaining and applying Message generator 230 preferably passes the Client Open ^ 

policy or service treatments to the anticipated traffic flow message 420 down to the communication facility 228 where 

from program 224. Local policy enforcer 210 is preferably it is encapsulated into one or more TCP packets and for- 

configured to respond to such advertisements with its IP warded to the local policy enforcer 210 in a conventional 

address. 30 manner. The APD messages, such as the Client Open mes- 

Component 226 may receive a "virtual" address that sage 420, preferably use a well-known destination port 

corresponds to a group of available local policy enforcers in number, such as 1022. The source destination port for the 

a manner similar to the Standby Router Protocol described flow declaration component 226 may be dynamically 

in U.S. Pat. No. 5,473,599, which is hereby incorporated by agreed-upon when the TCP session with the local policy 

reference in its entirety. A single "active" local policy 35 enforcer 210 is first established. At the local policy enforcer 

enforcer may be elected from the group to perform the 210, message 420 is received at the communication engine 

functions described herein. 312 and passed up to the traffic flow state machine engine 

It should be further understood that the flow declaration 310. The traffic flow state machine engine 310 examines the 

component 226 preferably opens one TCP session with the message 420 which it recognizes as a Client Open message 

local policy enforcer 210 per application program 224 per 40 due to the value (e.g., "7") loaded in the operation code field 

network interface card (NIC). More specifically, if host/ 520. Local policy enforcer 210 may first determine whether 

servel 222 is connected to network 200 through multiple it has adequate resources to accept a new client. For 

LANs (each with a corresponding NIC), then traffic flows example, local policy enforcer 210 may include an admis- 

from program 224 may be forwarded onto any of these sion control module (not shown) that determines the per- 

LANs. To ensure that the appropriate policy or service 45 centage of time that its central processing unit (CPU) has 

treatments are applied regardless of which LAN initially remained idle rece^Uy i ,|ite^ayj^ 

carries the flow, flow declaration component 226 preferably ^p olicies associated wi tfi ,componenI»226):and the availability 

establishes a separate communication session with a local of its traffic management resources, such as meter 322, 

policy enforcer 210 through each LAN (i.e., through each marker 324 and shaper/dropper 326, to manage additional 

NIC) for every program 224 that requests services from 50 traffic flows. 

component 226. Assuming local policy enforcer 210 has sufficient avail- 
In particular, flow declaration component 226 directs able resources, it replies to the flow declaration component 
message generator 230 to formulate a Client Open message 226 with a Client Accept message 422. The format of the 
420 for forwarding to the local policy enforcer 210. The Client Accept message 422 is similar to the format of the 
Client Open message 420 establishes communication 55 Client Open message 422 shown in FIG. 5A. In particular, 
between the local policy enforcer 210 and the flow.decla- the Client Accept message 422 also includes a header that is 
ration component 226 and may be used to determine similar to header 510 and a timer area that is similar to timer 
whether the local policy enforcer 210 has the resources to area 512. The operation code for the Client Accept message 
monitor the anticipated flow from the application program 422 (which is loaded in field 520) is another predefined 
224 and to apply the appropriate policy or service treat- 60 value (e.g., "8") so that flow declaration component 226 will 
ments. FIG. 5Ais a block diagram of the preferred format of recognize this APD message as a Client Accept message, 
the Client Open message 420. In particular, the Client Open The traffic flow state machine engine 310 also loads a value 
message 420 includes at least two elements: a header 510 in the Keep Alive Timer Value field 532 which may corre- 
and a timer area 512. The header 510 includes a version field spond to the value proposed by component 226 or may be a 
516, a flags field 518, an operation code field 520 and a 65 new value selected by the local policy enforcer 210. 
message length field 524. It may also include one or more The traffic flow state machine engine 310 hands the Client 
unused fields, such as field 522. Version field 516 preferably Accept message 422 to its communication engine 312 which 
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may encapsulate the message as required and forwards it to be "1327". The flow handle field 550 preferably contains the 

the host/server 222. At the host/server 222 the message is flow handle H2 generated by the flow declaration compo- 

rcceived at the communication facility 228 and passed up to nent 226 in response lo the BeginFlow( ) call 424. 

the flow declaration component 226 where it is examined. Following the handle area 540 are a plurality of policy 

Flow declaration component 226 examines the operation 5 bindings 552, such as policy bindings 552a, 5526 and 552c. 

code field 520 and "learns" that it is a Client Accept The policy bindings 552 contain encoded versions of the 

message. Flow declaration component 226 also examines information stored in the traffic flow data structure 234 that 

the keep alive timer field 532 to determine what value has corresponds to the flow handle specified in field 550. Each 

been specified by local policy enforcer 210, which is used to policy binding 552, moreover, has two elements, a policy 

generate additional APD messages, as described below. 10 identifier clement 554 and an encoded policy instance clc- 

It should be understood that the flow declaration compo- ment 556. Basically, the policy identifier element 554 iden- 

nent 226 may issue the Client Open message 420 as soon as tifies the type or instance of policy element that is contained 

the StartUp( ) call 420 is issued if not earlier. in the associated encoded policy instance element 556. Each 

When application program 224 is ready to begin trans- policy identifier clement 554 includes a plurality of fields, 

mitting the anticipated traffic flow (e.g., the IBM stock quote 15 including a length field 558 (specifying its length), a policy 

form) to end station 212, it issues a BeginFIow( ) call 424a identifier (Policy ID) type field 560 and a policy identifier 

to the flow declaration component. Preferably, the field 562. Each encoded policy instance clement 556 simi- 

BeginFlow( ) call 424a is issued slightly before (e.g., 50 ms) larly includes a plurality of fields, including a length field 

program 224 begins forwarding the message to the commu- 564 (specifying its length), an encapsulation type field 566 

nication facility 228. It should be understood, however, that 20 and an encoded policy element field 568. 

the BeginFlowf ) call 424a may be issued at the same time The first policy binding 552a, for example, may contain 

as the anticipated flow to end station 212 is commenced or an encoded copy of the source port identified by program 

even slightly later. The application program 224 uses the 224 with the SetSourcePort( ) call 414a and stored at the 

previously returned handle HI as an argument to the respective traffic flow data structure 234. More specifically, 

BeginFlow( ) call 424a. If program 224 wishes to receive 25 message generator 230 loads policy identifier field 562a 

any feedback regarding the policy or service treatments that with the type or instance of the policy element (e.g., "source 

are applied to the respective traffic flow, it may also assert a port"). In the preferred embodiment, this name is a Policy 

flag argument in the BeginFlow( ) call 424a and add one or Identifier (PID) as specified in the Internet Engineering Task 

more callback functions as additional arguments. The call- Force (IETF) draft document COPS Usage for Differenti- 

back function preferably identifies an entry point in the 30 ated Services submitted by the Network Working Group, 

application program 224 to which the requested feedback is dated December 1998, and incorporated herein by reference 

to be returned. Program 224 may also load other information in its entirety. A PID specifies a particular policy class (e.g., 

or data that will simply be returned to it with the requested a type of policy data item) or policy instance (e.g., a 

feedback to assist program 224, for example, in mapping the particular instance of a given policy class) in a hierarchical 

returned feedback to a particular task. 35 arrangement. The Policy ID type field 560a contains a 

The BegioFlow( ) call 424 is received and examined by predefined value reflecting that field 562a contains informa- 

the flow declaration component 226, which, in part, deter- tion in PID format. Component 226 preferably includes a 

mines whether the feedback flag has been set. If so, it also Policy Information Base (PIB) for use in deriving the 

looks for any callback functions and information arguments particular policy identifiers, as described in COPS Usage for 

specified by program 224. Flow declaration component 226 40 Differentiated Services. 

may also return a flow handle, H2, to program 224 as an The message generator 230 then accesses the source port 

argument to the BeginFlow( ) call 424. Component 226 may information from the respective traffic flow data structure 

also return an acknowledgment or error message as addi- 234 and translates it into a machine independent format 

tional arguments. Assuming that the BegjnFlow( ) call 424 suitable for transmission across network 200. For example, 

did not cause any errors, flow declaration component 226 45 the source port information may be translated in accordance 

then directs its message generator 230 to formulate a Flow with the ASN.l translation technique. The encapsulated 

Start APD message 426. version of the source port is then loaded in the encoded 

FIG. 5B is a block diagram of a preferred Flow Start policy element field 568a of binding 552a. The encapsula- 

message 426, which is similar to the Client Open message tion type field 566a contains a predefined value reflecting 

420. In particular, the Flow Start message 426 includes a 50 that the information in field 568a has been encapsulated 

header 510 having a flags field 518 and an operation code according to ASN.l. Message generator 230 similarly builds 

field 520, among others. If program 224 requested policy additional bindings 552 that contain encapsulated versions 

feedback, then message generator 230 preferably asserts the of the source IP address, transport protocol, destination port 

flag in field 518. In addition, the operation code field 520 is number and destination IP address as specified by program 

preferably loaded with the value "1" to indicated that this 55 224 in API calls 414f>-414e and stored at traffic flow data 

particular APD message is a Flow Start message 426. structure 234. Message generator 230 also formulates sepa- 

Following the header 510 is a handle area 540, which rate bindings 552 for each of the application-level data items 

includes a length field 542 (specifying the length of the established by the application program 224 through 

handle area 540), a Class Number (C-Num) field 544, a application- level API calls 416. Again, each of these 

Class Type (C-Type) field 546, a device handle field 548 and 60 application-level data items may be identified by a corre- 

a flow handle field 550. The C-Num field 544 is loaded with sponding PID which is loaded in the Policy ID type field 562 

an agreed-upon value (e.g., "1") indicating that this portion of the respective binding 552. The application-level data 

of the Flow Start message 426 contains a flow handle. The item is then translated into a machine-independent format 

C-Type field 546 may also be set to "1". The device handle (e.g., through ASN.l) and loaded in the respective encoded 

field 548 preferably contains a 2 octet identifier selected by 65 policy element field 568, as described above, 

the local policy enforcer 210 during establishment of the It should be understood that other translation techniques, 

communication session. For example, the device handle may such as XDR, may also be used. It should be further 
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understood that the contents of other fields, including policy a given DS codepoint, IP Precedence and/or user priority, 

identifier field 556, should be similarly translated into Policy server 216 may also formulate one or more behav- 

machine-independent format. ioral rules that instruct the local policy enforcer 210 to map 

The Flow Start message 426 is then handed down to the packets with the given DS codepoint to a particular queue 

communication facility 228 for transmission to the local 5 (e.g., 330d) and to apply a particular scheduling algorithm 

policy enforcer 210. At the local policy enforcer 210, the (e.g., WFQ). These policy decisions or rules are then loaded 

message 426 is captured by the communication engine 312 into a Policy Decision message 430 and sent from the policy 

and handed to the traffic flow state machine engine 310 server 216 to the local policy enforcer 210. 

which parses the operation code field 520 to determine that Communication engine 3L2 captures the Policy Decision 

the message is a Flow Start APD message. In response, the 10 message 430 and forwards it to the traffic flow state machine 

local policy enforcer 210 proceeds to obtain the particular engine 310, which, in turn, extracts the policy decisions or 

policy rules or service treatments that are to be applied to rules contained in the message 430. Traffic flow state 

this flow (e.g., a stock quote form for IBM). In particular, the machine engine 310 preferably establishes a flow state (not 

local policy enforcer 210 formulates a Request Policy mes- shown) for the anticipated traffic flow that includes infor- 

sagc 428 for transmission to the policy server 216. In the 15 mation identifying the anticipated traffic flow (such as IP 

preferred embodiment, the format of the Request Policy addresses, port numbers and transport protocol) and the 

message 428 corresponds to the Request message of the policy decisions or rules to be applied to that traffic. Traffic 

Common Open Policy Service (COPS) Protocol specified in flow slate machine engine 310 may also build one or more 

the IETF draft document The Common Open Policy Service data structures (such as tables) to store the mappings con- 

(COPS) Protocol, dated Aug. 6, 1998, and incorporated 20 tained in the Policy Decision message 430. 

herein by reference in its entirety. As packets or frames are received at the local policy 

According to the COPS protocol, Request messages enforcer 210, they are examined by the packet/frame clas- 

include a plurality of flags, such as a request type flag and sifier 314. More specifically, the packet/frame classifier 314 

a message flag, and a plurality of objects. The request type parses the source and destination port fields 152, 154 (FIG. 

flag for message 428 is preferably set to the COPS value that 25 1C) and the IP source and destination address fields 126, 128 

corresponds to "Incoming-Message/Admission Control and the protocol field 124 (FIG. IB). This information is 

Request" type COPS messages and the message type flag then supplied to the traffic flow state machine engine 310, 

should be set to "1". Furthermore, the "In-Interface" object which determines whether a traffic flow state has been 

of the Request Policy message 428 is preferably set to the established for such packets or frames. Assuming the pack- 

VLAN designation associated with the local policy enforc- 30 ets or frames correspond to the anticipated flow from the 

er's interface at which the Flow Start message 426 was program 224 to end station 212 (e.g., the IBM stock quote 

received. The bindings 552 of the Flow Start message 426, form), a traffic flow state will exist and have associated 

which may not be meaningful to the local policy enforcer policy rules or service treatments as specified in the Policy 

210, are preferably loaded (i.e., copied as opaque objects) Decision message 430 from policy server 216. Local policy 

into the Client Specific Information (Clients!) object portion 35 enforcer 210 then applies the specified treatments to these 

of the Request Policy message 428. The local policy packets or frames. For example, the traffic flow state 

enforcer 210 also loads a unique handle that identifies the machine engine 310 may instruct the packet/frame classifier, 

anticipated traffic flow from program 224 into the Request to set the DS field 132 (FIG. IB) of such packets or frames 

Policy message 428. This handle, moreover, is used in all to a value associated with best effort traffic. Similarly, the 

messages exchanged between the local policy enforcer 210 40 traffic flow state machine engine 310 may instruct the queue 

and the policy server 216 for this anticipated traffic flow. The selector/mapping entity 318 to place these packets or frames 

handle may be the flow handle H2 previously returned by the in a particular (e.g., moderate priority) qutue^Itaerattgely 

flow declaration component 226. "^fof . inXddilion , .p.aclSl/f rame. Classifier :n^y-b^insu^cteli5to*J 

It should be understood that intermediate network Idad thei TbS "fiel d"122 (FJGZlB)Ioirme:user^ribrity^fie^ 

devices, such as local policy enforcer 210, may learn of the 45 ^108 (FIGrlA) with predetelminefl^ 

identity of the policy server 216 through any conventional <: %lnese treatments at other-intermediate inetw^k ^viceSt-sucfo-^ 

means, such as manual configuration or a device configu- ^asldeyiclF2©87--- 3 

ration protocol. To the extent the application program 224 requested 

The Request Policy message 428 is received at the policy feedback as to the policy or service treatments applied to this 

server 216, which examines the network parameters sped- so traffic flow, the local policy enforcer 210 may formulate and 

fied for the anticipated traffic flow, including the IP send one or more Decision Feedback APD messages 432 to 

addresses, port numbers and transport protocol. The policy the flow declaration component 226. The Decision Feedback 

server 216 also examines the application- level parameters message 432 is similar in format to the Flow Start message 

specified by program 224 and provided to the policy server 426. In particular, the Decision Feedback message 432 has 

216 in the Request Policy message 428. Based on this 55 a header 510 and a handle area 540. For Decision Feedback 

information, the policy server 216 makes a decision regard- messages 432, the operation code field 520 is preferably 

ing the policy rules or service treatments to be applied to this loaded with the value "3". Appended to the handle area 540 

traffic flow. For example, as described in co-pending U.S. are one or more decision bindings (not shown) that are 

patent application Ser. No. 09/179,036, which is hereby similar in format to the policy bindings 552. In particular, 

incorporated by reference in its entirety, the policy server so each decision binding contains a treatment specified by the 

216 may obtain information from the repository 218 and/or policy server 216 and applied by the local policy enforcer 

network administrator via end station 220 and, in response, 210. For example, a first decision binding may provide that 

formulate one or more traffic management rules, such as the specified traffic flow is being marked with a particular 

classification, behavioral or configuration rules. More DS codepoint. Other decision bindings may specify the IP 

specifically, server 216 may formulate one or more classi- 65 Precedence or user_priority values being entered in fields 

fication rules for instructing the local policy enforcer 210 to 122, 108, respectively, of this traffic flow. Other decision 

classify data packets and frames from this traffic flow with bindings may be more abstract and describe abstract service 
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classes granted to the traffic flow. Hie Decision Request 
message 432 is received at the communication facility 228 
and passed up to the flow declaration component 226. The 
flow declaration component 228 extracts the particular treat- 
ments from the decision bindings and returns them to the 
application program 224 through a callback function 434 
specified by the application program 224 in the BeginFlow( 
) call 424. 

In order to maintain the TCP session established between 
the flow declaration component 226 and the local policy 
enforcer 210, the flow declaration component 226 may send 
one or more Keep Alive APD messages 436. The Keep Alive 
message 436 simply includes a header 510 with the opera- 
tion code field set to "9" and the message length field 524 set 
to "0". Flow declaration component 226 preferably sends at 
least one Keep Alive message 436 within every time period 
specified in the keep alive timer value field 532 of the Client 
Accept message 422. 

It should be understood that the policy server 216 may 
unilaterally send a Decision Change message 438 to the 
local policy enforcer 210 if a change in the previously 
supplied policy rules or service treatments occurs after the 
Policy Decision message 430 was sent. For example, the 
policy server 216 may obtain up-dated information from the 
repository 218 or from the network administrator through 
end station 220. This up-dated information may affect the 
policy rules or service treatments previously supplied to the 
local policy enforcer 210. In response, the policy server 216 
preferably formulates and sends the Decision Change mes- 
sage 438. The format of the Decisions Change message 438 
is preferably the same as the Policy Decision message 430. 
The Decision Change message 438 is similarly captured at 
the communication engine 312 of the local policy enforcer 
210 and forwarded to the traffic flow state machine engine 
310. 

To the extent the Decision Change message 438 includes 
new policy rules or service treatments, the traffic flow state 
machine 310 preferably up-dates its traffic flow state accord- 
ingly. In addition, the traffic flow state machine 310 applies 
the up-dated policy rules or service treatments to subse- 
quently received packets or frames that correspond to the 
traffic flow. The local policy enforcer 210 may also generate 
and send a Decision Feedback message (like message 432) 
to component 226 if feedback was requested by program 
224. 

The policy server 216 may also transmit one or more 
Decision messages to other intermediate network devices, 
such as device 208, that are along the path of the anticipated 
traffic flow from host/server 222 to end station 212. These 
Decision messages similarly inform the intermediate net- 
work devices as to what policy rules or service treatments to 
apply to the traffic flow from program 224, which presum- 
ably has already been classified by the local policy enforcer 
210. Policy server 216 is thus able to provide end-to-end 
quality of service support. 

It should be understood that the local policy enforcer 210 
and the policy server 216 may exchange additional COPS 
messages as required, such as COPS Client Open and COPS 
Client Accept messages among others. 

The local policy enforcer 210 may also send one or more 
Keep Alive APD messages 440 to the flow declaration 
component 226 at the host/server 222. The Keep Alive 
message 440 from the local policy enforcer 210 preferably 
has the same format as Keep Alive message 436 from 
component 226. 

It should be further understood that the application pro- 
gram 224 may change certain characteristics associated with 
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the traffic flow if the nature of the flow changes over time. 
For example, after reviewing the quote for IBM stock, the 
user at end station 212 may decide to place a "buy" order for 
IBM stock. In response, application program 224 may 

5 transmit a stock transaction form. Furthermore, the policies 
or service treatments to be applied to the traffic flow corre- 
sponding to the stock quote form may be very different from 
the treatments that should be applied to the traffic flow 
corresponding to the stock transaction form. Accordingly, 

10 the program 224 may issue one or more new application- 
level Set API calls 442. For example, the program may issue 
a Setlntegcr( ) call 442a, a SetString( ) call 4426, a SetFloat( 
) call 442c and/or a Set( ) call 442c/. These calls are generally 
the same as the previously described application-level Set 

15 API calls 416 and, although the program 224 utilizes the 
previously returned handle HI as an argument, it enters new 
or updated information (e.g., stock transaction versus stock 
quote forms). In response, the flow declaration component 
226 overwrites the corresponding entries in the respective 

20 traffic flow data structure 234 with the new or up-dated 
information. 

The application program 224 then issues a 
BeginUpdatedFlow( ) call 444 at or about the time that it 
begins forwarding the stock transaction form to the user at 

25 end station 212. The BegjnUpdatedFlow( ) call 444 is 
preferably the same as the BeginFlow call 424 described 
above. In response, the flow declaration component 226 
directs the message generator 230 to generate and send a 
Flow Update APD message 446 to the local policy enforcer 

30 210. The Flow Update message 446 is similar to the Flow 
Start message 424 and also includes one or more bindings 
generated from the information stored in the respective 
traffic flow data structure 234. Since the information con- 
tained in the traffic flow data structure 234 has been up-dated 

35 (through the issuance of the Set API calls 442), the bindings 
will be different from the bindings appended to the original 
Flow Start message 426. 

At the local policy enforcer 210, the Flow Update mes- 
sage 446 is examined and a Request Policy Update message 

40 428 is preferably formulated and sent to the policy server 
216. The Request Policy Update message 428 has the same 
general format as the original COPS Request Policy mes- 
sage 448, although it includes the new bindings generated as 
a result of the Set API calls 442. The policy server 216 

45 examines the Request Policy Update message 448 and, in 
response, obtains the appropriate policy rules or service 
treatments for this up-dated traffic flow. The policy server 
216 then loads these up-dated policy rules or service treat- 
ments in a Policy Decision Update message 450, which is 

50 sent to the local policy enforcer 210. Since at least some of 
the traffic characteristics have changed, the policies or 
treatments contained in the Policy Decision Update message 
450 may be different than the treatments previously pro- 
vided in the Policy Decision 430. For example, the up-dated 

55 policies may provide that this traffic flow is to be classified 
as high priority and granted excellent effort treatment. 
Simnilarly, the up-dated policies may provide that the DS 
field 132 of packets or frames from this traffic flow should 
be loaded with a DS codepoint associated with expedited 

60 forwarding. 

The Policy Decision Update message 450 is received al 
the local policy enforcer 210 which modifies the correspond- 
ing traffic flow state with the up-dated policies. The local 
policy enforcer 210 also applies these up-dated policies to 

65 any subsequently received packets or frames from the host/ 
server 222 that satisfy the previously identified network and 
transport layer parameters (e.g., IP addresses, port numbers 



01/21/2004, EAST Version: 1.4.1 



US 6,651,101 Bl 

19 20 

and transport protocol). Local policy enforcer 210 may also Client Accept, How Start, etc.) has a different value assigned 
provide feedback to component 226 as described above. to it. Furthermore, if a local policy enforcer is unable to 
When the traffic flow between the application program handle a particular application program or traffic flow (e.g., 
224 and end station 212 is finished, program 224 preferably insufficient memory or other resources), it preferably 
issues a ReleaseFlow( ) call 452 to the Bow declaration 5 responds to the Client Open message with a Client Close 
component 226 using the previously returned flow handle message, rather than a Client Accept message. 
H2 as an argument. How declaration component 226 may In preferred embodiment, the flow declaration corn- 
return an acknowledgment or an error message to the P oncnt * 26 « implemented "ifflww « » «n« \of steps 
program 224. In response, the flow declaration component c *?*? J 1 host/server 222. Nonetheless, it should be 
V,J : j. . . « ft , f , t ni c j ^ understood that the method may be implemented, either 
226 directs message generator 230 to formulate a Flow End 10 . „ ... . 3 r . . ' 
, , c a '.^ c l T-i i- j ac-a wholly or in part, through one or more computer hardware 
APD message 454. The format of die Flow End message 454 Add £ ioDall me t iQvention F is preferably 

is preferably the same as the Flow Start message 426 utiHzed om with ^ flows of suffidem { ^ ( 

although the operation code field 520 is preferably loaded grcater man 5 _ 10 packcts) ^ application program 224 

with 2 to signify that it is a How End message. Although may bc configure d not to request bindings or issue API calls 

the flow declaration component 226 forwards the Flow End 15 f or short traffic flows. 

message 454 to the local policy enforcer 210, it preferably j t should be understood that some or all of the above 

does not discard the traffic flow data structure 234. described functionality of the local policy enforcer 210 may 

In response, the local policy enforcer 210 formulates a be located at the host/server 222. For example, the host/ 

COPS Request message 546 to inform the policy server 216 server 222 may include a traffic flow state machine engine 

that the respective traffic flow is finished. The policy server 20 310 that is capable of sending and receiving COPS Request 

216 may reply with a Decision message 458 authorizing the and Decision messages directly to and from the policy server 

local policy enforcer 210 to erase the traffic flow state which 216. In this case, the Client Open, Flow Start and Flow 

was established for this particular flow. If the application Update messages are simply inter-process communications 

program 224 subsequently initiates another traffic flow with within the host/server 222, rather than being forwarded 

the same end station 212, it may re-use the information 25 across the networkitTj^oper^ing s ystcm^^ 

stored in the traffic flow data structure 234 by issuing *222-may "also- include one_grmore-resourc^-that-r^y£be 

another BeginFlow( ) call 424 utilizing the previously ^^^pt^-tami^j^^^c^wa^ 
returned bindings handle HI. The flow declaration ^oipo- J^^^^^l^'^^S^^^^S^^* 
nent 226, in response, proceeds as described 



sending a Flow Start message 426 to the local policy ^-.queugs^g-j 

enforcer 210 It should bc further understood that the local policy 

The application program 224 may also issue a enforcer 210 may make poUcy or service treatment decisions 

DestroyBindings( ) call 460 to the flow declaration compo- for traffic flows ideQtified by me flow declaration component 

nent 226 whenever it concludes that the bindings are no 2 26 without querying the policy server 216. That is, the local 

longer needed. Program 224 preferably utilizes the previ- 35 policy enforcer 210 may cache certain policy rules or 

ously returned bindings handle HI as an argument to the treatments. 

DestroyBindings( ) call 460. In response, component 226 i n another aspect of the invention, the application pro- 
preferably discards the contents of the traffic flow data gram 224 may request policy decisions in advance of issuing 
structure 234 that corresponds to bindings handle HI. the BeginFlow( ) call 424. For example, program 224 may 
When the application program 224 is closed it should 40 only have a small number of application-level parameter 
shutdown all outstanding traffic flow services by issuing bindings. After creating the bindings (using only the 
corresponding ReleaseFlow( ) calls 452 and it should also application-level parameters) as described above, the pro- 
destroy all bindings that it created by issuing grara 224 may issue a GetFlowDecision( ) system call to 
DestroyBindings( ) calls 460. In response, component 226 component 226 and, in return, receive a handle, H3. Corn- 
directs message generator 230 to formulate a Client Close 45 poncnt 226 issues an Obtain Decision APD message to the 
APD message 462. The Client Close message 462 is simply local policy enforcer 210 for each binding, including the 
a header 510 with the operation code field 520 loaded with specified application-level parameters. The local policy 
the value "10". In response, the local policy enforcer 210 enforcer 210 will obtain the appropriate policy rules or 
formulates and sends a COPS Request message 464 to the service treat ments to be applied to these, as yet un-specified, 
policy server 216 indicating that the program 224 is closed. 50 «fl ows » & described above. 

The policy server 216 may reply with a COPS Decision when program 224 is about to begin a flow corresponding 

message 466 instructing the local policy enforcer 210 to to ooe of mcse bindings, it may issue a BeginFlow( ) call, 

release all of the corresponding traffic flow slates that were including the network and transport layer parameters for the 

previously established for the application program 224. traffic flow and thc handle H3 for the corresponding 

One skilled in the art will recognize that two or more of 55 application-level bindings. Component 226 then forwards 

the previously described API system calls may be combined mis inf onnaUon i n a Flow Start message 426 to the local 

into a single call or that any one call may be broken down po i icy enforcer 210 as described above. Since the local 

into multiple calls. One skilled in the art will also recognize po i icy enforcer 210 has already obtained thc policy or 

that the particular names of the API system calls is unim- scrvicc treatments to bc applied to this flow, it need not 

portant. Thus, it is an object of the present invention to cover 60 qucr y the policy server 216. Instead, the local policy 

the foregoing communicating relation between the applica- cn f orccr 210 simply monitors the traffic from host/server 

lion program 224 and the flow declaration component 226, 222 and, when it identifies the specified traffic flow, applies 

regardless of the particular implementation ultimately cho- mc prcv i ously rcccivc d policy rules or service treatments, 
sen. 

It should also be understood that any set of values may be 65 Enhanced RS VP Messaging 

inserted in the operation code field 520 of the APD messages In a further aspect of thc invention, the flow declaration 

provided that each APD message type (e.g., Client Open, component 226 may be configured to exchange one or more 
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modified Resource reSerVation Protocol (RSVP) messages 
with the local policy enforcer 210 in place of the APD 
messages described above. RSVP is a well-known Internet 
Control protocol for reserving resources, typically 
bandwidth, between a sender entity and a receiver entity. 
RSVP is defined at Request for Comments (RFC) 2205, 
September 1997, from the Network Working Group of the 
IETF, and is hereby incorporated by reference in its entirety. 
The protocol defines two fundamental message types: RSVP 
path messages (Path) and reservation request messages 
(Resv). Basically, senders transmit Path message down- 
stream throughout the network to potential receivers offering 
to supply a given message stream. Receivers, wishing to 
obtain the proposed message stream, transmit Resv mes- 
sages that are propagated upstream all the way back to the 
sender. At each intermediate node in the network, bandwidth 
resources arc reserved to ensure that the receiver will obtain 
the message stream. 

In this embodiment of the present invention, component 
226, rather than generating and forwarding the Flow Start 
APD message 426 in response to the BeginFlow( ) call 424, 
formulates and sends a modified RSVP Path message to the 
local policy enforcer 210. FIG. 6 is a block diagram illus- 
trating the preferred format of a modified RSVP Path 
message 610. Modified Path message 610 carries the net- 
work and transport layer parameters and application-level 
parameters specified for the anticipated traffic flow. In 
particular, message 610 preferably includes at least three 
elements: an RSVP header 612, a first area 614 (which 
carries the network and transport layer parameters) and at 
least one RSVP Policy-Data object 616 (which carries the 
application-level parameters). As provided in RFC 2205, the 
RSVP header includes a version field 618, a flags field 620, 
a message type field 622, an RSVP checksum field 624, a 
Send Time To Live (TTL) field 626, a reserved field 628 and 
an RSVP length field 630. 

Component 226 preferably loads version field 618, which 
corresponds to the version of RSVP, with the appropriate 
value (e.g., "1"). Flags field 620 is preferably de-asserted as 
no flags are presently defined. Message type field 622, which 
indicates the type of message (e.g., "1" for RSVP Path 
messages and "2" for RSVP Resv messages) is preferably 
loaded with the value "1" to indicate that message 610 is a 
Path message. Il should be understood that field 622 may 
alternatively be loaded with a new value lo indicate that 
message 610 is a modified RSVP Path message. The RSVP 
Checksum field 624 may be loaded with a computed check- 
sum for message 610. The Send^TTL field 626 is preferably 
loaded with an IP time to live value, and the RSVP length 
field 630 preferably contains the length of message 610. 

The first area 614 preferably includes an RSVP sender 
template object 632 and an RSVP session object 634, each 
having a plurality of fields. More specifically, the sender 
template and session objects 632, 634 each have a length 
field 638 (loaded with the length of the respective object), a 
class number field (C-Num) 634 and a class type (C-type) 
field 642. For the sender template object 632, which further 
includes an IP source address (SA) field 644, a source port 
□umber field 646 and may include one or more un-used 
fields 648, the respective C-Num field 640 is preferably 
loaded with "11" to signify that it is an RSVP sender 
template object and the respective C-Type field 642 may be 
loaded with "1" to indicate that fields 644 and 646 carry the 
IPv4 address and the TCP/UDP port number, respectively, at 
host/server 222 for the anticipated traffic flow. For the 
session object 634, which further includes an IP destination 
address (DA) field 650, a transport protocol field 652, a flags 
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field 654 and a destination port number field 656, the 
respective C-Num field 640 is loaded with "1" to signify that 
it is an RSVP session object and the respective C-Type field 
642 may be loaded with "1" to indicate that fields 650 and 
5 656 carry the IPv4 address and the TCP/UDP port number, 
respectively, for the corresponding process at end station 
212 for the anticipated traffic flow. Component 226 may 
assert flags field 654 if it is capable of policing its own traffic 
flows. 

to One skilled in the art will recognize that first area 614 of 
modified RSVP Path message 610 may be modified in any 
number of ways, including fewer or additional fields or to 
carry IPv6 information. 
The RSVP Policy__Dala object 616 also has a length field 

15 638, a C-Num field 640 and a C-Type 642 field. In addition 
RSVP PoIicy_Data object 616 includes a policy-data object 
field 658. The respective length field 638 carries the length 
of object 616 and the respective C-Num field is loaded with 
"14" to indicate that field 658 is a policy_data object field. 

20 The C-Type field 642 of object 616 is preferably loaded with 
a new value (e.g., "2") to signify that policy_data object 
field 658 carries application- level parameters. Furthermore, 
policy_data object field 658 is loaded by component 226 
with the application-level bindings specified by program 224 

25 preferably in the manner as described above with reference 
to FIG. 5B. 

One skilled in the art will also recognize that the 
application- level parameters may be carried in multiple 

3Q RSVP Policy_Data objects 616. 

This modified RSVP path message 610 is preferably 
handed to the communication facility 228 for forwarding to 
the local policy enforcer 210 where it is examined. In 
response, the local policy enforcer 210 and the policy server 

35 216 exchange Request Policy 428 and Policy Decision 430 
messages, as described above, in order to obtain the policy 
rules or service treatments to be applied to the traffic flow 
identified in the modified RSVP Path message 610. Local 
policy enforcer 210 also extracts and stores the network and 

^ transport layer parameters from the RSVP Sender Template 
object 614 in order to identify the particular traffic flow from 
host/server 222. 

The local policy enforcer 210 may also reply to compo- 
nent 226 with a modified RSVP Resv message rather than 

45 the Decision Feedback message 432. This modified RSVP 
Resv message preferably includes a header similar to header 
612, but with the message type field 622 loaded with the 
value "2" to indicate that it is an RSVP Resv messages or 
with a new value to indicate that it is a modified RSVP Resv 

50 message. The modified RSVP Resv message also includes 
one or more RSVP Policy_Data objects similar to object 
616. In this case, however, object 616 carries the decision 
bindings for the anticipated traffic flow as described above. 
Component 226 may extract these decision bindings in order 

55 to provide feedback to application 224. 

As shown, component 226 utilizes a modified RSVP path 
message 610 to identify network and transport layer param- 
eters and application-level parameters to the local policy 
enforcer 210. The modified RSVP Path message 610, 

60 moreover, is preferably not forwarded by the local policy 
enforcer 210, unlike conventional RSVP Path and Resv 
messages which arc propagated all the way between the 
sender and receiver entities. 

It should be understood that the local policy enforcer 210 

65 is preferably in close proximity to host/server 222 so that the 
classification of packets or frames from the anticipated 
traffic flow occurs early in their journey through the network 
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200. It should also be understood that the traffic flow from 
end station 212 to host/server 222 may similarly be identi- 
fied and appropriate policy rules or service treatments 
applied thereto. It should be further understood that the flow 
declaration component 226 is configured to handle and 5 
separately identify multiple traffic flows from multiple appli- 
cation programs executing at the host/server 222 so that the 
appropriate policy rules or service treatments may be indi- 
vidually applied to each such traffic flow through the net- 
work 200. For example, program 224 may be simulta- 1Q 
neously sending a print transaction t the print server 214. 

The foregoing description has been directed to specific 
embodiments of the invention. It will be apparent, however, 
that other variations and modifications may be made to the 
described embodiments, with the attainment of some or all 
of their advantages. For example, other client-server com- 15 
munications protocols, besides COPS, may be utilized by 
the policy server and the local policy enforcer. In addition, 
the present invention may also be utilized with other net- 
work layer protocols, such as IPv6, whose addresses are 128 
bits long. Therefore, it is the object of the appended claims 20 
to cover all such variations and modifications as come 
within the true spirit and scope of the invention. 

What is claimed is: 

1. A method for applying a service treatment to a plurality 

of network messages issued by a network entity connected 25 
to a computer network, the network messages corresponding 
to a traffic flow, the computer network configured to support 
transport and network communication layers and having a 
policy enforcer, the method comprising the steps of: 
receiving from the policy enforcer a request policy mes- 
sage identifying the traffic flow and including one or 
more application -level parameters; 
utilizing at least some of the included application-level 
parameters to select one or more service treatments to 35 
be applied to the traffic flow; 
generating a policy decision message containing the one 
or more service treatments selected for the traffic flow; 
and 

sending the policy decision message to the policy 40 
enforcer. 

2. The method of claim 1 wherein the application-level 
parameters specify one or more of the following character- 
istics: the size of a file being transmitted, a video segment 
name, a video segment viewer, a user name, a user 45 
department, an application identifier, a transaction type, a 
transaction name, an application state, a calling party, a 
called party, a compression method, a service level, a 
uniform resource locator (URL) and a mime type. 

3. The method of claim 2 wherein the one or more 50 
selected service treatments specified in the policy decision 
message includes instructions for marking network mes- 
sages corresponding to the traffic flow with one or more of 

a selected Differentiated Services Codepoint (DSCP), a 
selected Type of Service (ToS), and a selected user_priority. 55 

4. The method of claim 1 further comprising the steps of: 
formulating one or more classification rules for instruct- 
ing the policy enforcer to mark network messages 
corresponding to the traffic flow with one or more of a 
selected Differentiated Services Codepoint (DSCP), a eo 
selected Type of Service (ToS), and a selected user_ 
priority; and 

loading the policy decision message with the one or more 
classification rules. 

5. The method of claim 4 wherein the policy enforcer has 65 
a plurality of queues and queue scheduling algorithms, the 
method further comprising the steps of: 



formulating one or more behavioral rules for instructing 
the policy enforcer to map network messages marked 
with a selected DSCP, ToS and/or uscr_priority to a 
particular queue and to apply a designated queue sched- 
uling algorithm. 

6. The method of claim 5 wherein a behavioral rule 
instructs the policy enforcer to apply a Weighted Fair 
Queuing (WFQ) queue scheduling algorithm. 

7. The method of claim 1 wherein the computer network 
further has a repository configured to store information, and 
the step of utilizing comprises the steps of: 

requesting information from the repository; and 
utilizing the information from the repository in selecting 

the one or more service treatments to be applied to the 

traffic flow. 

8. The method of claim 1 further comprising the step of 
sending a decision change message to the policy enforcer, 
following the step of sending the policy decision message, 
wherein the decision change message contains one or more 
service treatments that differ from the service treatments 
specified by the policy decision message. 

9. The method of claim 1 further comprising the steps of: 
receiving a request policy update message from the policy 

server, the request policy update message containing 
traffic flow and/or application-level parameters that 
differ from the traffic flow and/or application-level 
parameters contained in the request policy message; 

utilizing the traffic flow and/or application -level param- 
eters from the request policy.update message to select 
one or more new service treatments; 

generating a policy decision update message containing 
the one or more new service treatments; and 

sending the policy decision update message to the policy 
enforcer. 

10. A computer readable medium containing executable 
program instructions for use in applying a service treatment 
to a plurality of network messages issued by a network entity 
connected to a computer network, the network messages 
corresponding to a traffic flow, the computer network con- 
figured to support transport and network communication 
layers and having a policy server, the executable program 
instructions comprising program instructions for: 

receiving from the network entity a message identifying 
the traffic flow and including one or more application- 
level parameters; 

generating a request policy message for the identified 
traffic flow, the request policy message containing at 
least some of the application-level parameters included 
in the message; 

sending the request policy message to the policy server; 
and 

receiving a policy decision message from the policy 
server specifying one or more service treatments to be 
applied to the traffic flow, the one or more service 
treatments based, at least in part, upon the application- 
level parameters contained in the request policy mes- 
sage. 

11. The computer readable medium of claim 10 further 
comprising program instructions for: 

identifying network messages corresponding to the traffic 
flow; and 

applying the one or more service treatments specified in 
the policy decision message to those network messages 
identified as corresponding to the traffic flow. 

12. The computer readable medium of claim 10 wherein 
the application-level parameters included in the request 
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policy message specify one or mote of the following char- 
acteristics: the size of a file being transmitted, a video 
segment name, a video segment viewer, a user name, a user 
department, an application identifier, a transaction type, a 
transaction name, an application state, a calling party, a 5 
called party, a compression method, a service level, a 
uniform resource locator (URL) and a mime type. 

13. The computer readable medium of claim 10 wherein 
the one or more service treatments specified in the policy 
decision message include instructions for marking network 10 
messages corresponding to the traffic flow with one or more 

of a selected Differentiated Services Codepoint (DSCP), a 
selected Type of Service (ToS), and a selected user_priority. 

14. The computer readable medium of claim 12 further 
comprising program instructions for generating and sending 15 
to the policy server one or more client accept messages 
carrying a keep alive timer value. 

15. A policy server for use in applying a service treatment 
to a plurality of network messages issued by a network entity 
connected to a computer network, the network messages 20 
corresponding to a traffic flow, the computer network con- 
figured to support transport and network communication 
layers and having a policy enforcer, the policy server com- 
prising: 

means for sending and receiving messages to and from the 25 
policy enforcer via the computer network; and 

means, responsive to receipt of a request policy message 
identifying the traffic flow and containing one or more 
application-level parameters, for selecting one or more 
service treatments to be applied to the traffic flow, 30 
wherein 

the one or more service treatments are selected based at 
least in part on the one or more application-level 
parameters, and 
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the message sending and receiving means sends a 
policy decision message to the policy enforcer car- 
rying the one or more selected service treatments. 
16. A policy enforcer for use in applying a service 
treatment to a plurality of network messages issued by an 
application program running on a network entity connected 
to a computer network, the network messages corresponding 
to a traffic flow, the computer network configured to support 
transport and network communication layers and having a 
policy server, the policy enforcer comprising: 

means for sending and receiving messages via the com- 
puter network to the network entity and the policy 
server; and 

means for applying varying service treatments to traffic 

flows, wherein 
in response to receiving a flow start message from the 
network entity identifying the traffic flow and contain- 
ing one or more application-level parameters, the 
policy enforcer generates and sends to the policy server 
a request policy message for the identified traffic flow, 
the request policy message containing at least some of 
the application-level parameters included in the flow 
start message, 

in response to receiving a policy decision message from 
the policy server specifying one or more service 
treatments, applying the one or more service treatments 
to the traffic flow from the network entity, and 
the one or more service treatments based, at least in part, 
upon the application-level parameters contained in the 
request policy message. 

***** 



01/21/2004, EAST Version: 1.4.1 



(12) United States Patent 

McCloghrie et al. 



USO06286O52B1 

(10) Patent No.: US 6,286,052 Bl 
(45) Date of Patent: Sep. 4, 2001 



(54) METHOD AND APPARATUS FOR 

IDENTIFYING NETWORK DATA TRAFFIC 
FLOWS AND FOR APPLYING QUALITY OF 
SERVICE TREATMENTS TO THE FLOWS 

(75) Inventors: Keith McCloghrie, San Jose, CA (US); 

Sllvano Gal, Vigliano d'Asti (IT); Shai 
Mohaban, Sunnyvale, CA (US) 

(73) Assignee: Cisco Technology, Inc., San Jose, CA 
(US) 

( * ) Notice: Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U.S.C 154(b) by 0 days. 

(21) Appl. No.: 09/206,067 

(22) Filed: Dec. 4, 1998 

(51) Int. CI. 7 G06F 13/00 

(52) U.S. CI 709/238; 709/232; 709/250; 

709/236; 370/235 

(58) Field of Search 709/102, 103, 

709/104, 227, 228, 229, 230, 232, 233, 
234, 235, 236, 238, 240, 245, 250; 370/229, 

230, 235 

(56) References Cited 

U.S. PATENT DOCUMENTS 

4.769.810 9A988 Eckberg, Jr. et al 370/60 

4.769.811 9/1988 Eckberg, Jr. et a 1 370/60 

5,224,099 6/1993 Corbalis ct al 370/94.2 

5,263,157 11/1993 Janis 395/600 

5,473,599 12/1995 Li et al 370/16 

5,606,668 2/1997 Shwed 395/200.1 

5,666,353 9/1997 Klausmeier et al 370/230 

5,751,967 5/1998 Raab et al 395/200.58 

5,819,042 10/1998 Hansen 395/200.52 

5,832,503 11/1998 Malik et al 707/104 



5,842,040 • 11/1998 Hughes et al 710/11 

5,872,928 2fl999 Lewis et al 395/200.52 

5,889,953 3/1999 Thebaut et al 395/200.51 

5,987,513 11/1999 Prithviraj et al 709/223 

6,041,347 3/2000 Harsham et al 709/220 

6,046,980 • 4/2000 Packer 370/230 

6,047,322 4/2000 Vaid et al 709/224 

6,091,709 • 7/2000 Harrison et al 370/235 

6,104,700 • 8/2000 Haddock et al 370/235 

OTHER PUBLICATIONS 

Ortiz, Jr., S., "Active Networks: The Programmable Pipe- 
line", Computer pp. Aug. 19-21, 1998. 
IEEE P802.1D Standard (draft 15) "Local and Metropolitan 
Area Networks", pp. 1, 50-56 and 378-381 (Nov. 1997). 

(List continued on next page.) 

Primary Examiner — Viet D. Vu 

(74) Attorney, Agent, or Firm—Cesari & McKenna, LLP 

(57) ABSTRACT 

A system within a computer network identifies specific 
traffic flows originating from a given network entity and 
requests and applies appropriate policy rules or service 
treatments to the traffic flows. A network entity includes a 
flow declaration component that communicates with one or 
more application programs executing on the entity. The flow 
declaration component includes a message generator and an 
associated memory for storing one or more traffic flow data 
structures. For a given traffic flow, the application program 
issues one or more calls to the flow declaration component 
providing it with information identifying the traffic flows. 
The flow declaration component then opens a flow manage- 
ment session with a local policy enforcer that obtains policy 
rules or service treatments for the identified flow from a 
policy server and applies those rules or treatments to the 
specific traffic flows from the network entity. 

47 Claims, 10 Drawing Sheets 



SS 3 



□ r 



□ 



I / I s -" 



-DOT 



-2W 







/ 


0 

e 

0 






/ 


\ 



GEN l 



L3L 



_uh_ 



01/21/2004, EAST Version: 1.4.1 



US 6,286,052 Bl 

Page 2 



OTHER PUBLICATIONS 

"An Emerging Trend in the Internet Services Market", 

Hewlett-Packard Corp. (date unknown). 

Wrodawaski, J., "The Use of RSVP with IETF Integrated 

Services", IETF Network Working Group (Sep. 1997). 

Bernet, Y. et al. ( "A Framework for Use of RSVP with 

Diff-serv Networks", IETF (Nov. 1998). 

Bemet, Y. et ah, "Requirements of Diff-serv Boundary 

Routers", IETF Differentiated Services (Nov. 1998). 

Yadav, S. ct al., "Identity Representation for RSVF\ IETF 

(Jan. 1999). 

Heinanen, J. et al., "Assured Forwarding PHB Group", IETF 
(Sep. 1998). 

Jacobson, V. et al., "An Expedited Forwarding PHB", IETF 
Differentiated Services Working Group (Aug. 1998). 
Nichols, K. et al., "Definition of the Differentiated Services 
Field (DS Field) in the IPv4 and IPv6 Headers", IETF 
Differentiated Services Working Group (Aug. 1998). 
Blake, S. el al., "An Architecture For Differentiated Ser- 
vices", IETF Differentiated Services Working Group (Aug. 
1998). 

Bernet, Y. et al., "A Framework for End-to-End QoS 
Combining RSVP/Interserv and Differentiated Services", 
IETF (Mar. 1998). 



Yavatkar, R. et al., "A Framework for Policy-based Admis- 
sion Control", IETF (Nov. 1997). 

Boyle, J. et al , "The COPS (Common Open Policy Servi- 
ce)Protocol", IETF (Aug. 1998). 

Reichmeyer, F. et al., "COPS Usage for Differentiated 
Services", IETF Network Working Group (Aug. 1998). 
"Cisco IOS® Software Quality of Service Solutions", Cisco 
Systems, Inc.(Jul. 1998). 

"Queuing, Traffic Shaping, and Filtering", Cisco Systems, 
Inc.(Sep. 1996). 

"Network Node Registry Overview 1 ' (Jan. 29, 1998). 
"Network Node Registry User's Guide" (Apr. 1997). 
"Network Node Registry — Access Control Lists" (Apr. 
1997). 

"Quality of Service Policy Propagation via Border Gateway 

Protocol", Cisco Systems, Inc. (Feb. 1998). 

"Distributed Weighted Random Early Detection", Cisco 

Systems, Inc., pp. 1-6 (Feb. 1998). 

"Distributed Weighted Fair Queuing", Cisco Systems, Inc. 

(Mar. 1998). 

"Action Request System®", Remedy Corporation (1998). 
"3COM's Framework for Delivering Policy-Powered Net- 
works", 3Com Corporation (Jun. 1998). 

* cited by examiner 



01/21/2004, EAST Version: 1.4.1 



U.S. Patent Sep. 4, 2001 Sheet 1 of 10 US 6,286,052 Bl 




< 

o 

Q_ 



CD 

LL 



o 



8 



A 

oo 

C\J 



CD 
CNJ 




< 

Q 



— CO 



O 

o 
o 
I— 
o 

Q_ 



CO 

o 



< 
cc 
o 

GO 



LL 



CNI 
CNJ 




A 

CD 

in 



o 
lo 



CNI 
LO 



< 



I- 
< 

o 
O 



O 

LL 




01/21/2004, EAST Version: 1.4.1 



U.S. Patent Sep. 4, 2001 Sheet 2 of 10 US 6,286,052 Bl 



.200 




FIG. 2 



01/21/2004, EAST Version: 1.4.1 



U.S. Patent Sep. 4, 2001 Sheet 3 of 10 US 6,286,052 Bl 



CM 



V 



o 

CO 



tr 

LU 

I 

=) 
o 

LU 
X 

o 

CO 



CD 
CO 



o 

CM 
CO 



7 



CNJ 
CO 

CO 



1 



CO 



lis 

=t £ CM 

puj w 



Oh 

LU C3 
UJq: 
LU < 



0) 

o 

CO 
CO 



o 

CO 
CO 



o _ 
o f 

CO 
CO 



CO 
CO 



CD 

o 

CO 
CO 



CO 
LL 



CD 
CO 









LU 


TIO 






1* 


< LU 






q>LU 








LULL 


p 








^ LU 




LU <r 








o 


CO 




o 









o 
o 

o 



LU 



cm 

CM 
CO 



CM 
CO 



a: 

LU 
h- 
LU 



a: 

LU 

a: 
< 



CD 
CM 
CO 

JL. 



2g 

LU Q_ 
CLQ. 

<o 



01/21/2004, EAST Version: 1.4.1 



U.S. Patent Sep. 4, 2001 Sheet 4 of 10 



US 6,286,052 Bl 



APPLICATION 
PROGRAM 
(224) 



FLOW 
DECLARATION 
COMPONENT 

(226) 



LOCAL POLICY 
ENFORCER 
(210) 



POLICY 
SERVER 
(216) 



410 



^StartUpC 



412 



NewBtndings( 



414a 



414c 



414^ 



SetSourcePort( )C 
414b. 

SetSourcelP( ; 

.1 



SetTransportProtocol( 
414d^ 

SetDestinationPort( 
414e. 

SetDestinationlP( ; 



Setlntegerf K 



416-4 



416a' 



416b" 



416c" 



? SetFloat( 
SetString( X 



416d" 



Set( K 



FIG. 4A 



01/21/2004, EAST Version: 1.4.1 



U.S. Patent 



Sep. 4, 2001 



Sheet 5 of 10 



US 6,286,052 Bl 



APPLICATION 
PROGRAM 
(224) 



418a^ 

GetSourcePortf ; 

GetSourcelP( ; 

418<| GetSourceProtocol( 

GetDestinationPort{ 

GetDestinationlP( 




424' 



BeginFlow( 



FLOW 
DECLARATION 
COMPONENT 

(226) 



420^ 
CLIENT OPEN 



FLOW START 
426* 



CALL BACK 
FUNCTION 

434* 



LOCAL POLICY 
ENFORCER 
(210) 



CLIENT 
ACCEPT 

422 



428 



REQUEST 
* POLICY 



DECISION 
FEEDBACK 

432 



POLICY 
SERVER 

(216) 



POLICY 
DECISION 

430* 



FIG.4B 



01/21/2004, EAST Version: 1.4.1 



U.S. Patent 



Sep. 4, 2001 



Sheet 6 of 10 



US 6,286,052 Bl 



APPLICATION 
PROGRAM 
(224) 



FLOW 
DECLARATION 
COMPONENT 

(226) 



LOCAL POLICY 
ENFORCER 
(210) 



POLICY 
SERVER 
(216) 



KEEP ALIVE 



436" 



Setlnteger( ) 



442a 



442b 



442c 




KEEP ALIVE 
^440 



442d 



BeginUpdatedFlowf )\ 
444* 



FLOW UPDATE 
446* 



REQUEST 
448 * POLICY 
UPDATE 



DECISION 
CHANGE 

438* 



POLICY 
DECISION 
UPDATE 

450* 



FIG. 4C 



01/21/2004, EAST Version: 1.4.1 



U.S. Patent 



Sep. 4, 2001 



Sheet 7 of 10 



US 6,286,052 Bl 



APPLICATION 
PROGRAM 
(224) 


FLOW 
DECLARATION 
COMPONENT 

(226) 


LOCAL POLICY 
ENFORCER 
(210) 


POLICY 
SERVER 
(216) 


452 

^ReleaseFlowf 

460. 

^DestroyBindings( )C 


454 ^ 
FLOW END- 
CLIENT CLOSE - 
$ 
462 


>- 

456^ 
REQUEST- 

REQUEST- 
464* 

4 


► 

-DECISION 
458 

-DECISION 
^\ 
466 



FIG. 4D 



01/21/2004, EAST Version: 1.4.1 



U.S. Patent Sep. 4, 2001 Sheet 8 of 10 US 6,286,052 Bl 







420 

/ 




^516 


^518 


^520 


^522 


VERSION 


FLAGS 


OPERATION 
CODE 


UNUSED 




MESSAGE LENGTH 


524 


LENGTH 

526 


C-NUM 

528 


C-TYPE 

530 


UNUSED 

534 


UNUSED 


KEEP ALIVE TIMER VALUE 



536 536 

FIG. 5A 



01/21/2004, EAST Version: 1.4.1 



U.S. Patent 



Sep. 4, 2001 



Sheet 9 of 10 



US 6,286,052 Bl 



510 



540 



552a 



552 



552b 



552d 



516 



VERSION 



JL 



518 



426 



FLAGS 



520 



OPERATION 
CODE 



522 



// 



MESSAGE LENGTH 



LENGTH 



542 



DEVICE HANDLE 548 



LENGTH 



558a 



C-NUM 



544 



C-TYPE 



546 



FLOW HANDLE 550 



POLICY ID TYPE 560a 



POLICY IDENTIFIER 



562a 



LENGTH 



564a 



ENCAPSULATION TYPE 566a 



568a 



ENCODED POLICY ELEMENT 



LENGTH 



558b 



POLICY ID TYPE 560b 



POLICY IDENTIFIER 



562b 



LENGTH 



564b 



ENCAPSULATION TYPE 566b 



568b 



ENCODED POLICY ELEMENT 



LENGTH 



558c 



POLICY ID TYPE 560c 



POLICY IDENTIFIER 



562c 



LENGTH 



564c 



ENCAPSULATION TYPE 566c 



568c 



ENCODED POLICY ELEMENT 



524 



554a 



556a 



554b 



556b 



554c 



556c 



FIG. 5B 



01/21/2004, EAST Version: 1.4.1 



U.S. Patent 



Sep. 4, 2001 



Sheet 10 of 10 



US 6,286,052 Bl 



618 ,620 



• 622 



.610 



624 



VERS 


FLAGS 


MSG. TYPE 


RSVP CHECKSUM 


626 

SENDJTL 


628 
(RESERVED) 


RSVP LENGTH 630 




LENGTH 


638 


C-NUM 640 


C-TYPE 642 


IPSA 644 




//// 


648 


SOURCE PORT 646 




LENGTH 


638 


C-NUM 640 


C-TYPE 642 


IP DA 650 


PROTOCOL 652 


FLAGS 654 


DESTINATION PORT 656 




LENGTH 


638 


C-NUM 640 


C-TYPE 642 








658 




POLICY DATA 
{APPLICATION - LEVEL PARAMETERS) 



FIG. 6 



01/21/2004, EAST Version: 1.4.1 



US 6,2! 

1 

METHOD AND APPARATUS FOR 
IDENTIFYING NETWORK DATA TRAFFIC 
FLOWS AND FOR APPLYING QUALITY OF 
SERVICE TREATMENTS TO THE FLOWS 

CROSS-REFERENCE TO RELATED 
APPLICATIONS 

This application is related to the following copending 
U.S. patent application: 

U.S. patent application Ser. No. 09/179,036 entitled, 
METHOD AND APPARATUS FOR DEFINING AND 
IMPLEMENTING HIGH-LEVEL QUALITY OF SER- 
VICE POUCIES IN COMPUTER NETWORKS, filed Oct. 
26, 1998, now U.S. Pat. No. 6,167,495, and assigned to the 
assignee of the present application. 

FIELD OF THE INVENTION 

The present invention relates generally to computer 
networks, and more specifically, to a method and apparatus 
for identifying network data traffic flows and for applying 
quality of service or policy treatments thereto. 

BACKGROUND OF THE INVENTION 

A computer network typically comprises a plurality of 
interconnected entities that transmit (i.e., "source") or 
receive (i.e., "sink") data frames. A common type of com- 
puter network is a local area network ("LAN") which 
typically refers to a privately owned network within a single 
building or campus. LANs employ a data communication 
protocol (LAN standard), such as Ethernet, FDDI or Token 
Ring, that defines the functions performed by the data link 
and physical layers of a communications architecture (i.e., a 
protocol stack), such as the Open Systems Interconnection 
(OSI) Reference Model. In many instances, multiple LANs 
may be interconnected by point-to-point links, microwave 
transceivers, satellite hook-ups, etc, to form a wide area 
network ("WAN"), metropolitan area network ("MAN") or 
intranet. These LANs and/or WANs, moreover, may be 
coupled through one or more gateways to the Internet. 

Each network entity preferably includes network commu- 
nication software, which may operate in accordance with the 
well-known Transport Control Protocol/Internet Protocol 
(TCP/IP). TCP/IP basically consists of a set of rules defining 
how entities interact with each other In particular, TCP/IP 
defines a series of communication layers, including a trans- 
port layer and a network layer. At the transport layer, TCP/IP 
includes both the User Data Protocol (UDP), which is a 
connectionless transport protocol, and TCP which is a 
reliable, connection-oriented transport protocol. When a 
process at one network entity wishes to communicate with 
another entity, it formulates one or more messages and 
passes them to the upper layer of the TCP/IP communication 
stack. These to messages arc passed down through each 
layer of the stack where they are encapsulated into packets 
and frames. Each layer also adds information in the form of 
a header to the messages. The frames are then transmitted 
over the network links as bits. At the destination entity, the 
bits are re-assembled and passed up the layers of the 
destination entity's communication stack. At each layer, the 
corresponding message headers are also stripped off, thereby 
recovering the original message which is handed to the 
receiving process. 

One or more intermediate network devices are often used 
to couple LANs together and allow the corresponding enti- 
ties to exchange information. For example, a bridge may be 
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used to provide a "bridging" function between two or more 
LANs. Alternatively, a switch may be utilized to provide a 
"switching" function for transferring information, such as 
data frames or packets, among entities of a computer net- 

5 work. Typically, the switch is a computer having a plurality 
of ports that couple the switch to several LANs and to other 
switches. The switching function includes receiving data 
frames at a source port and transferring them to at least one 
destination port for receipt by another entity. Switches may 

10 operate at various levels of the communication stack. For 
example, a switch may operate at layer 2 which, in the OS! 
Reference Model, is called the data link layer and includes 
the Logical Link Control (LLC) and Media Access Control 
(MAC) sub-layers. 

15 Other intermediate devices, commonly referred to as 
routers, may operate at higher communication layers, such 
as layer 3, which in TCP/IP networks corresponds to the 
Internet Protocol (IP) layer. IP data packets include a cor- 
responding header which contains an IP source address and 

20 an IP destination address. Routers or layer 3 switches may 
re-asscmblc or convert received data frames from one LAN 
standard (e.g., Ethernet) to another (e.g. Token Ring). Thus, 
layer 3 devices arc often used to interconnect dissimilar 
subnetworks. Some layer 3 intermediate network devices 

25 may also examine the transport layer headers of received 
messages to identify the corresponding TCP or UDP port 
numbers being utilized by the corresponding network enti- 
ties. Many applications are assigned specific, fixed TCP 
and/or UDP port numbers in accordance with Request for 

30 Comments (RFC) 1700. For example, TCP/UDP port num- 
ber 80 corresponds to the hyper text transport protocol 
(HTTP), while port number 21 corresponds to file transfer 
protocol (ftp) service. 

35 Allocation of Network Resources 

Computer networks include numerous services and 
resources for use in moving traffic throughout the network. 
For example, different network links, such as Fast Ethernet, 
Asynchronous Transfer Mode (ATM) channels, network 

40 tunnels, satellite links, etc., offer unique speed and band- 
width capabilities. Particular intermediate devices also 
include specific resources or services, such as number of 
priority queues, filler settings, availability of different queue 
selection strategies, congestion control algorithms, etc. 

45 Individual frames or packets, moreover, can be marked so 
that intermediate devices may treat them in a predetermined 
manner For example, the Institute of Electrical and Elec- 
tronics Engineers (IEEE), in an appendix (802. lp) to the 
802. ID bridge standard, describes additional information for 

so the MAC header of Data Link Layer frames. FIG. 1 is a 
partial block diagram of a Data Link frame 100 which 
includes a MAC destination address (DA) field 102, a MAC 
source address (SA) field 104 and a data field 106. In 
accordance with the 802. 1Q standard, a user_priority field 

55 108, among others, is inserted after the MAC SA field 104. 
The user_priority field 108 may be loaded with a predeter- 
mined value (e.g., 0-7) that is associated with a particular 
treatment, such as background, best effort, excellent effort, 
etc. Network devices, upon examining the user_priority 

60 field 108 of received Data Link frames 100, apply the 
corresponding treatment to the frames. For example, an 
intermediate device may have a plurality of transmission 
priority queues per port, and may assign frames to different 
queues of a destination port on the basis of the frame's user 

65 priority value. 

FIG. IB is a partial block diagram of a Network Layer 
packet 120 corresponding to the Internet Protocol. Packet 
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120 includes a type_of_service (ToS) field 122, a protocol and/or packets) that typically correspond to a particular task, 
field 124, an IP source address (SA) field 126, an IP transaction or operation (e.g., a print transaction) and may be 
destination address (DA) field 128 and a data field 130. The identified by 5 network and transport layer parameters (e.g., 
ToS field 122 is used to specify a particular service to be source and destination IP addresses, source and destination 
applied to the packet 120, such as high reliability, fast 5 TCP/UDP port numbers and transport protocol), 
delivery, accurate delivery, etc., and comprises a number of Furthermore, the treatment that should be applied to these 
sub-fields (not shown). The sub-fields include a three bit IP different traffic flows varies depending on the particular 
precedence (IPP) field and three one bit flags (Delay, traffic flow at issue. For example, an on-line trading appli- 
Throughput and Reliability). By setting the various flags, an cation may generate stock quote messages, stock transaction 
entity may indicate which overall service it cares most about 10 messages, transaction status messages, corporate financial 
(e.g., Throughput versus Reliability). Version 6 of the Inter- information messages, print messages, data back-up 
net Protocol (I Pv6) similarly defines a traffic class field, messages, etc. A network administrator, moreover, may wish 
which is also intended to be used for defining the type of to have very different policies or service treatments applied 
service to be applied to the corresponding packet. to these various traffic flows. In particular, the network 
Recently, a working group of the Internet Engineering 15 administrator may want a stock quote message to be given 
Task Force (IETF), which is an independent standards higher priority than a print transaction. Similarly, a SI 
organization, has proposed replacing the ToS field 112 of million stock transaction message for a premium client 
Network Layer packets 120 with a one octet differentiated should be assigned higher priority than a $100 stock trans- 
services (DS) field 132 that can be loaded with a differen- action message for a standard customer. Most intermediate 
tiated services codepoint. Layer 3 devices that are DS 20 network devices, however, lack the ability to distinguish 
compliant apply a particular per-hop forwarding behavior to among multiple traffic flows, especially those originating 
data packets based on the contents of their DS fields 132. from thc s ™ c h° st or server. 

Examples of per-hop forwarding behaviors include expe- „ , „ 

dited forwarding and assured forwarding. The DS field 132 SUMMARY OF THE INVENTION 

is typically loaded by DS compliant intermediate devices 25 It is an ob j ect of the preseilt invention to provide a method 

located at the border of a DS domain, which is a set of DS and apparatus for identifying one or more traffic flows from 

compliant intermediate devices under common network a source entity. 

administration. Thereafter, interior DS compliant devices It fe a mrther ob ■ t of me m mvention to provide a 

along the path simply apply the corresponding forwarding method afld apparatus for oblainmg toffic poUcies to be 

behavior to the packet 120. 30 to ideQtified traffic flows 

FIG. 1C is a partial block diagram of a Transport Layer It is a o5ject of me present mvenlion to manage 

packet 150. The network layer packet 150 preferably traffic flows in accordance with corresponding policies. 

includes a source port field 152, a destination port field 154 p r : B «„ t u a ; n „ ar , t : n „ tn , m **u„A ™a 

j j . c u iJ^- .l i~ u iM j if.. Briefly, the invention relates to a method and apparatus 

and a data field 156, among others. Fields 152 and 154 are c . . „- „ . . .. r t_ 

, L1 , , , ' , f , „ , , „ 35 tor identifying specific traffic flows originating from a 

preferably loaded with the predefined or dynamicaUy Detwork J * ^ for predetermined policy or 

agreed-upon TCP or UDP port numbers being utilized by the s ^ i(x tea|n f ents to thos ;7ows. In particular, a network 

corresponding network entities. ^ indudes a fiow dedaration componcnt that fe OTUphjd 

Service Level Agreements to one or more application programs executing on the entity. 

40 The network entity also includes a communication facility 

To interconnect dispersed computer networks, many orga- that supports message exchange between the application 

nizations rely on the infrastructure and facilities of internet program and other network entities. The flow declaration 

service providers (ISPs). For example, an organization may component includes a message generator and an associated 

lease a number of Tl lines to interconnect various LANs. memory for storing one or more traffic flow data structures. 

These organizations and ISPs typically enter into service 45 For a given traffic flow, the application program calls the 

level agreements, which include one or more traffic speci- flow declaration componcnt and provides it with one or 

fiers. These traffic specifiers may place limits on the amount more identifying parameters corresponding to the given 

of resources that the subscribing organization will consume flow. In particular, the application program may provide 

for a given charge. For example, a user may agree not to network and transport layer parameters, such as IP source 

send traffic that exceeds a certain bandwidth (e.g., 1 Mb/s). 50 and destination addresses, TCP/UDP port numbers and 

Traffic entering the service provider's network is monitored transport protocol associated with the given traffic flow. It 

(i.e., "policed") to ensure that it complies with the relevant also provides one or more application-level parameters, such 

traffic specifiers and is thus "in-profile". Traffic thai exceeds as a transaction-type (e.g., a slock transaction), a sub- 

a traffic specifier (i.e., traffic that is "out-of-profile") may be transaction-type (e.g., a $1 Million stock purchase order), 

dropped or shaped or may cause an accounting change (i.e., 5S etc. The flow declaration component provides this informa- 

causing the user to be charged a higher rate). Another option Uon to a local policy enforcer, which, in turn, may query a 

is to mark the traffic as exceeding the traffic specifier, but policy server to obtain one or more policy or service 

nonetheless allow it to proceed through the network. If there treatments that are to be applied to the identified traffic flow. 

is congestion, an intermediate network device may drop The local policy enforcer then monitors the traffic originat- 

such "marked" traffic first in an effort to relieve the conges- ^ ing from the network entity and, by examining IP source and 

tion. destination addresses, among other information, applies the 

... , t a- n prescribed policy or service treatments to the given traffic 

Multiple Traffic Flows J ow * 

A process executing at a given network entity, moreover, In the preferred embodiment, the application program and 

may generate hundreds if not thousands of traffic flows that 65 the flow declaration component at the network entity interact 

arc transmitted across the corresponding network every day. through an Application Programming Interface (API) layer, 

A traffic flow generally refers to a set of messages (frames which includes a plurality of system calls. In addition, tfie 
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flow declaration component generates and transmits one or 
more application parameter declaration (APD) messages to 
the local policy enforcer. The APD messages contain the 
network and transport layer parameters (e.g., IP source and 
destination addresses, TCP/UDP port numbers and transport 
protocol) stored at the traffic flow data structure for the given 
flow. The messages may also contain the application-level 
parameters specified by the application program. The 
information, moreover, may be in the form of objects 
generated by the flow declaration component. Preferably, the 
flow declaration component and the local policy enforcer 
exchange messages in accordance with a novel protocol that 
defines a message scheme in addition to a message format. 
The local policy enforcer and the policy server may utilize 
the Common Open Policy Service (COPS) protocol to 
request and receive particular policies or service treatment 
rules. Preferably, the policy server maintains or otherwise 
has access to a store of network policies established by the 
network administrator. 

In another aspect of the invention, the local policy 
enforcer may establish a traffic flow state that includes the 
policy or service treatments specified by the policy server. It 
then monitors the traffic flows originating from the network 
entity looking for the given traffic flow. Once the given 
traffic flow is identified, the local policy enforcer applies the 
policy or service treatments set forth in the corresponding 
traffic flow state. For example, the policy enforcer may mark 
the packets or frames with a high priority DS codepoint. 
When the given traffic flow is complete, the application 
program may notify the flow declaration component, which, 
in turn, signals the end of the traffic flow to the local policy 
enforcer. The policy enforcer may request authorization 
from the policy server to release or otherwise discard the 
respective traffic flow state. 

In an alternative embodiment of the invention, policy 
rules may be cached at the local policy enforcer to eliminate 
the need to query the policy server for each new traffic flow. 

In another embodiment of the invention, the APD mes- 
sages arc replaced with one or more enhanced Path or 
Reservation messages as originally specified in the Resource 
ReScrVation Protocol (RSVP). 

BRIEF DESCRIPTION OF THE DRAWINGS 

The above and further advantages of the invention may be 
better understood by referring to the following description in 
conjunction with the accompanying drawings, in which: 

FIGS. 1A-1C, previously discussed, are partial block 
diagram of network messages; 

FIG. 2 is a highly schematic block diagram of a computer 
network; 

FIG. 3 is a highly schematic, partial block diagram of 
local policy enforcer; 

FIGS. 4A-4D are flow diagrams illustrating the message 
scheme and tasks performed in identifying a traffic flow and 
obtaining the corresponding policies; 

FIGS. SA-5B are highly schematic block diagrams illus- 
trating the preferred format of an application parameter 
declaration message; and 

FIG. 6 is a highly schematic block diagram illustrating an 
enhanced Resource RcSerVation Protocol (RSVP) message 
in accordance with the invention. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

FIG. 2 is a highly schematic block diagram of a computer 
network 200. The network 200 includes a plurality of local 
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area networks (LANs) 202, 204 and 206 that are intercon- 
nected by a plurality of intermediate network devices 208, 
210. Coupled to the LANs are a plurality of entities, such as 
end station 212 and print server 214. The network further 

5 includes at least one policy server 216 that may be coupled 
to a repository 218 and to a network administrator's station 
220. A server suitable for use as policy server 216 is any 
Intel x86/Windows NT© or Unix-based platform. The net- 
work 200 also includes at least one host or server 222 

1Q configured in accordance with the present invention. 

In particular, the host/server 222 includes at least one 
application program or process 224, a flow declaration 
component 226 and a communication facility 228. The flow 
declaration component 226 includes a message generator 
230 that is in communicating relation with the communica- 

15 tion facility 228. Component 226 is also coupled to an 
associated memory 232 for storing one or more traffic flow 
data structures 234. The application program 224 is in 
communicating relation with both the communication facil- 
ity 228 and, through an Application Programming Interface 

20 (API) layer 236, to the flow declaration component 226. The 
communication facility 228, in turn, is connected to network 
200 via LAN 206. The host/server 222 also comprises 
conventional programmable processing elements (not 
shown), which may contain software program instructions 

25 pertaining to the methods of the present invention. Other 
computer readable media may also be used to store the 
program instructions. 

The communication facility 228 preferably includes one 
or more software libraries for implementing a communica- 

30 tion protocol stack allowing host/server 222 to exchange 
messages with other network entities, such as end station 
212, print server 214, etc. In particular, the communication 
facility 228 may include software layers corresponding to 
the Transmission Control Protocol/Internet Protocol (TCP/ 

35 IP), the Internet Packet Exchange (IPX) protocol, the Apple - 
Talk protocol, the DECNet protocol and/or NetBIOS 
Extended User Interface (NetBEUI). Communication facil- 
ity 228 further includes transmitting and receiving circuitry 
and components, including one or more network interface 

4Q cards (NICs) that establish one or more physical ports to 
LAN 206 or other LANs for exchanging data packets and 
frames. 

Intermediate network devices 208, 210 provide basic 
bridging functions including filtering of data traffic by 

45 medium access control (MAC) address, "learning" of a 
MAC address based upon a source MAC address of a frame 
and forwarding of the frame based upon a destination MAC 
address or route information field (RIF). They may also 
include an Internet Protocol (IP) software layer and provide 

50 route processing, path determination and path switching 
functions. In the illustrated embodiment, the intermediate 
network devices 208, 210 are computers having transmitting 
and receiving circuitry and components, including network 
interface cards (NICs) establishing physical ports, for 

55 exchanging data frames. Intermediate network device 210, 
moreover, is preferably configured as a local policy enforcer 
for traffic flows originating from host/server 222, as 
described below. 
It should be understood that the network configuration 

60 200 of FIG. 2 is for illustrative purposes only and that the 
present invention will operate with other, possibly far more 
complex, network topologies. For example, the repository 
218 and network administrator's station 220 may be directly 
or indirectly connected to the policy server 216 (e.g., 

65 through one or more intermediate devices). 

FIG. 3 is a partial block diagram of local policy enforcer 
210. Local policy enforcer 210 includes a traffic flow state 
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machine engine 310 for maintaining flow states correspond- 
ing to host/server 222 traffic flows, as described below. The 
traffic flow state machine engine 310 is coupled to a com- 
munication engine 312. The communication engine 312 is 
configured to formulate and exchange messages with the 
policy server 216 and the flow declaration component 226 at 
host/server 222. That is, communication engine 312 includes 
or has access to conventional circuitry for transmitting and 
receiving messages over the network 200. The traffic flow 
state machine engine 310 is also coupled to several traffic 
management resources and mechanisms. In particular, traffic 
flow slate machine engine 310 is coupled to a packet/frame 
classifier 314, a traffic conditioner entity 316, a queue 
selector/mapping entity 318 and a scheduler 320. The traffic 
conditioner entity 316 includes several sub-components, 
including one or more metering entities 322, one or more 
marker entities 324, and one or more shaper/dropper entities 
326. The queue selector/mapping entity 318 and scheduler 
320 operate on the various queues established by local 
policy enforcer 210 for its ports and/or interfaces, such as 
queues 330a-330e corresponding to an interface 332. 

The term intermediate network device is intended broadly 
to cover any intermediate device for interconnecting end 
stations of a computer network, including, without 
limitation, layer 3 devices or routers, as defined by Request 
for Comments (RFC) 1812 from the Internet Engineering 
Task Force (IETF), intermediate devices that are only par- 
tially compliant with RFC 1812, intermediate devices that 
provide additional functionality, such as Virtual Local Area 
Network (VLAN) support, IEEE 802.1Q support and/or 
IEEE 802.1D support, etc. Intermediate network device also 
includes layer 2 intermediate devices, such as switches and 
bridges, including, without limitation, devices that are fully 
or partially compliant with the IEEE 802. ID standard and 
intermediate devices that provide additional functionality, 
such as VLAN support, IEEE 802.1 Q support and/or IEEE 
802.1p support, Asynchronous Transfer Mode (ATM) 
switches, Frame Relay switches, etc. 

FIGS. 4A-4D are flow diagrams illustrating a preferred 
message scheme, relative to time t, in accordance with the 
present invention. In general, application program 224 iden- 
tifies one or more anticipated traffic flows to the flow 
declaration component 226, which, in turn, notifies the local 
policy enforcer 210, The local policy enforcer 210 requests 
and receives from the policy server 216 corresponding 
policy or service treatments for the anticipated traffic flows. 
Local policy enforcer 210 then monitors the traffic originat- 
ing from host/server 222 to identify those frames and/or 
packets corresponding to the identified flows. When such a 
flow is detected, local policy enforcer 210 applies the 
specified policy or service treatments to corresponding data 
frames and/or packets. 

Identification of Traffic Flows 

Assume that application program 224 is a stock transac- 
tion program that can provide stock quotes to and process 
stock transactions from remote clients, such as end station 
212. The application program 224 preferably communicates 
with end station 212 across network 200 through the com- 
munication facility 228 at host/server 222 in a conventional 
manner. Program 224 also communicates with the flow 
declaration component 226 preferably through a plurality of 
application programming interface (API) system calls to API 
layer 236. These API calls arc generally issued by the 
program 224 along with one or more arguments and may be 
returned by the flow declaration component 226. 

In particular, upon initialization at host/server 222, the 
application program 224 preferably issues a StartUpf ) API 
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call 410 to the API layer 236 at flow declaration component 
226. Program 226 preferably loads the StartUp( ) call 410 
with an application identifier that uniquely identifies appli- 
cation program 224 to component 226 as an argument. The 

5 application identifier may be a globally unique identifier 
(GUID), which is a 128 bit long value typically provided by 
the application developer, although other identifiers may 
also be used (e.g., application name). The StartUp( ) call 410 
may be returned by the flow declaration component 226 with 

lQ a version number as an argument. The version number 
corresponds to the version of software being executed by the 
flow declaration component 226. Other arguments, such as 
the quality-of-service (QoS) and/or traffic management 
resources that arc available to traffic flows originating from 
program 224, may also be returned by flow declaration 

15 component 226. 

For example, assume end station 212 contacts program 
224 and requests a stock quote for a particular equity (e.g., 
IBM common stock). Program 224 retrieves the requested 
information and prepares a message containing the 

20 requested stock quote for transmission to end station 212. 
Before program 224 commences the traffic flow correspond- 
ing to requested stock quote, it preferably issues a 
NewBindings( ) call 412 to the API layer 236 of the flow 
declaration component 226. The NewBindings( ) call 412 is 

25 used to inform flow declaration component 226 of an 
anticipated traffic flow to which some policy or service 
treatments should be applied. In response to the 
NewBindings( ) call 412, flow declaration component 226 
generates a bindings handle, e.g., HI, and creates a traffic 

30 flow data structure 234 within associated memory 232. 
Component 226 also maps or associates the traffic flow data 
structure 234 with the returned bindings handle HI. Flow 
declaration component 226 also returns the NewBindings( ) 
call 412 to program 224 with the handle HI as an argument. 

Next, traffic flow data structure 234 is loaded with infor- 
mation identifying the anticipated traffic flow. More 
specifically, program 224 next issues one or more network 
and transport layer parameter "Set" API calls 414. These Set 

^ calls 414 are used by the flow declaration component 226 to 
load traffic flow data structure 234 with network and trans- 
port layer parameters, such as Internet Protocol (IP) 
addresses and TCP/UDP port is numbers. For example, 
program 224 may issue a SetSourcePort( ) call 414a using 

4S the returned handle, HI, and the transport layer port number 
(e.g., TCP port number 1098) to be utilized by program 226 
as its arguments. In response, flow declaration component 
226 loads the identified source port number (i.e., 1098) into 
the traffic flow data structure 234 corresponding to handle 

SQ HI. Flow declaration component 226 may return an 
acknowledgment to program 224 as an argument to the 
SetSourcePort( ) call 414a. If a problem arises, flow decla- 
ration component 226 may return an error message (e.g., 
insufficient memory, unknown handle, out of bound port 

ss number, etc.) as the argument. 

In a similar manner, program 224 preferably causes the 
flow declaration component 226 to load the corresponding 
traffic flow data structure 234 with its IP address, the 
transport layer protocol (e.g., TCP) and the destination port 

w number and IP address of the receiving process at end station 
212. More specifically, in addition to the SctSourcePort( ) 
call 414c, program 224 may issue one or more of the 
following API system calls: 
SetSourceIP( ) 414ft; 

fi5 SetTransportProtocol( ) 414c; 
SetDestinationPort( ) 4Ud; and 
SetDestinationIP( ) 4l4e. 
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Again, program 224 uses the previously returned handle, correspond to integer, floating decimal point or alpha- 

Hl, and the corresponding information (e.g., IP address, numeric string formats. For example, program 224 may 

transport protocol or port number) as arguments to these API specify a policy element in the well-known external Data 

calls. As each Set API call 414 is received, the flow Representation (XDR) format. This XDR policy element is 

declaration component 226 loads the identified parameter 5 included as an argument in the Set( ) call 416d to the flow 

into the traffic flow data structure 234. Flow declaration declaration component 226, which, in response, simply 

component 226 may similarly return the Set API call 414 ^P 1 " thc XDR policy element into traffic flow data struc- 

with an error code or an acknowledgment as an argument. It tur . e 234 - ^ P^X c]c ™* m ^ alternatively be specified 

should be understood that additional "Set" API calls 414 ™ n g *J well-known Abstract Syntax Notation One 

may be defined depending on the format of the included 10 (ASN.l) format, or any other similar translation or encoding 

information. For example, by utilizing a techniques. 

SetSourceIPByLong( ) call (not shown), program 224 may ^ applicatioa-levcl parameters may encompass a whole 

specify its IP address as a 32 bit binary sequence. rao S c ° f ^a^n relating to different aspects of the 

Alternatively, by utilizing a SetSourceIPByString( ) call (not trafl ? c flow me ^cation program 224. For example, 

shown), program 224 may specify its IP address in dotted 15 "^cation-level parameters include such information as 

decimal format (e.g., 128.120.52.123) or as a host name uscr namc < c *- J ? ho Sn » lh )» ^cr department (e.g., 

(e.g., name.department.company.domain). In addition, a ^ m «™S' nTTf' ? f & ^ a PP hc ^T P 

single SetNetworkTransportParameters( ) system call may R/3 ' Pco Pl 6Soft > ^ application module (e.g 

be defined to set all of the network and transport layer SAP 11/3 accounting form, SAP R/3 order entry form, etc.), 

parameters at once. 20 traDsacuon tv P e ( c & > P™t), sub -transaction type (e.g., print 

, ' , , on HP Laser Jet Printer), transaction name (e.g., print 

It should be understood that application program 224 may momhl * sub 

-transaction name fe.e., print 

obtainlPsourceanddestinauonaddresses,portnumb e rsand mQJjfh[ gales QQ M > application state (e.g., 

transport protocol for use in communicating with end station Qormal mode> criUcal ^ n mode( back mode> 

212 from the communication facility 228 :m a conventional ^ etc ) Fof a Wdeo streamin appucation> the application- 

manner. It should be further understood that application lewl eters mi ht mclude user Qame) film namej film 

program 224 may utilize one or more wildcards when essioD method> film ioritV) timal bandwid th, etc. 

specifying the network and transport layer parameters. similarly, for a voice over IP application, the application- 

In addition to the network and transport layer parameters j eve i parameters may include calling party, called party, 

(e.g., source and destination-IP addresses, transport protocol 30 compression method, service level of calling party (e.g., 

and source and destination TCP/UDP port numbers) which go i dj suverj bronze), etc. In addition, for World Wide Web 

correspond to a particular flow of traffic, program 236 may (WWW) server-type applications, the application-level 

specify other identifying characteristics and/or policy ele- parameters may include Uniform Resource Locator (URL) 

ments of the anticipated traffic flow. That is, program 224 ( e g>) http://www.altavista.com/cgi-in/query?pg-aq&kl- 

may issue one or more application-level "Set" API calls 416 35 en&r-&search-Search&q-Speech+ne ar+recognition), 

to the flow declaration component 226. For example, a front-end URL (e.g., http://www.altavista.com), back-end 

Setlnteger( ) call 416a may be used (o specify some numeri- ijRL (e.g., query?pg-aq&kl^n&r-&search-Search&q- 

cal aspect (e.g., the size of a file being transferred) of the Speech-t-near+recognition), mime type (e.g., text file, image 

anticipated traffic flow. The arguments of the Setlnteger( ) fi i e; language, etc.), file size, etc. Those skilled in the art will 

call 416a include the handle HI, the numeric policy element 40 recognize that many other application-level parameters may 

(e.g., 786 Kbytes) and a policy element identifier (PID) that De defined. 

maps the numeric policy element to a particular type or class Application program 224 can also retrieve information 

of information (e.g., file size). When the traffic type data stored at the traffic flow data structure 234 by issuing one or 

structure 234 is subsequently transferred to and processed by more Get API system calls 418 (FIG. 4B). For example, 

other entities, as described below, the PID will identify its 45 program 224 may issue a GetSourcePort( ) call 418a using 

corresponding information. In response to the Setlnteger( ) th e returned bindings handle HI as an argument. In response, 

call 416a, flow declaration component 226 loads the traffic fl ow declaration component 226 parses the traffic flow data 

flow data structure 234 with the numeric policy element and structure 234 and retrieves the source port information 

the PID. Flow declaration component 226 may return the stored therein. Component 226 then returns the 

Setlntegerf ) call 416a to program 224 with an acknowl- 50 GetSourcePort( ) call 418a to program 224 with the source 

edgment or error message as arguments. port as an ar g U ment. Program 224 may issue similar Get API 

Other application-level Set calls may also be defined. For calls to retrieve other network and transport layer parameters 

example, a SetFloat( ) call 4166 is used to associate a stored at the traffic flow data structure 234. 

numeric value represented in floating decimal format with It should be understood that additional "Get" API system 

the anticipated traffic flow. A SetString( ) call 416c may be 55 calls may be defined for retrieving application-level infor- 

used to associate an alpha-numeric string with the antici- mation from the traffic flow data structure 234. 

pated flow. For example, if the anticipated traffic flow is to After issuing the application-level Set API calls 416, if 

contain a video segment, program 224 may identify the any, the corresponding traffic flow data structure 234 is 

name of the particular video segment and/or the viewer by complete. That is, data structure 234 has been loaded with 

utilizing the SetString( ) call 416c. Program 224 uses the 60 each of the identifying characteristics specified by the appli- 

handle HI and the particular alpha-numeric string as argu- cation program 224 for the anticipated traffic flow, 

ments for the SetString( ) call 416c. A PID that maps an In accordance with the invention, the flow declaration 

alpha-numeric string to name of a video segment is also component 226 also opens a communication session with 

included. This information is similarly loaded into the the local policy enforcer 210 and exchanges one or more 

corresponding traffic flow data structure 234 by the flow 65 Application Parameters Declaration (APD) messages. In the 

declaration component 226. A generic Set( ) call 416a* may preferred embodiment, the flow declaration component 226 

be used for specifying traffic flow characteristics that do not opens a reliable, connection-based "socket" session using 
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the well-know Transport Control Protocol (TCP) protocol of and a timer area 512. The header 510 includes a version field 

the TCP/IP communication protocol stack. A "socket" is 516, a flags field 518, an operation code field 520 and a 

essentially an interface between the application and trans- message length field 524. It may also include one or more 

port layers of a communication protocol stack that enables unused fields, such as field 522. Version field 516 preferably 

the transport layer to identify which process it must com- 5 contains the version of the software being implemented at 

municate with in the application layer. A socket interfaces to the flow declaration component 226. Flags field 518 pref- 

a TCP/IP communication protocol stack via APIs consisting erably contains at least one flag that may be asserted or 

of a set of entry points into the stack. Applications that de-asserted by the flow declaration component 226, as 

require TCP/IP connectivity thus use the socket APIs to described below. The operation code field 520 indicates the 

interface into the TCP/IP stack. For a connection-oriented 10 type of APD message. For a Client Open message 420, for 

protocol (such a TCP), the socket may be considered a example, field 520 is preferably loaded with the value "7". 

"session". The message length field 524 specifies the length (in octets) 

It should be understood that other protocols, including but of the Client Open message 420. 

not limited to connectionless protocols such as UDP, may be The timer area 512 includes a length field 526 which 

used to establish communication between the flow declara- 15 specifies the length (preferably in octets) of the timer area 

tion component 226 and the local policy enforcer 210. 512, a Class Number (C-Num) field 528, a Class Type 

Additionally, component 226 may communicate with local (C-Type) field 530 and a Keep Alive Timer Value field 532. 

policy enforcer 210 at the network layer by addressing IP Timer area 512 may also include one or more unused fields, 

format APD messages to end station 212 (i.e., using the 534, 536. The Class Number field 528 is loaded with an 

same destination address as the anticipated traffic flow) with 20 agreed-upon value (e.g., "11") indicating that this portion of 

the well-known Router Alert IP option asserted. Here, local the Client Open message 420 (i.e., timer area 512) contains 

policy enforcer 210 will intercept such asserted network a keep alive timer value. Where multiple types may exist for 

layer packets and may act on them itself and/or forward a given class number, the Class Type field 530 is used to 

them to some other network device. specify the particular type. Here, field 530 is preferably set 

Component 226 may be preconfigured with the IP address 25 to u 1". Flow declaration component 226 preferably loads the 

of the local policy enforcer 210 or it may dynamically obtain Keep Alive Timer Value field 532 with a proposed time 

the address of a local policy enforcer. For example, com- value (e.g., 30 seconds) to be used for maintaining the TCP 

ponent 226 or application program 224 may broadcast an session in the absence of substantive APD messages, as 

advertisement seeking the IP address of an intermediate described below. 

network device that is capable of obtaining and applying 30 Message generator 230 preferably passes the Client Open 

policy or service treatments to the anticipated traffic flow message 420 down to the communication facility 228 where 

from program 224. Local policy enforcer 210 is preferably it is encapsulated into one or more TCP packets and for- 

configured to respond to such advertisements with its IP warded to the local policy enforcer 210 in a conventional 

address. manner. The APD messages, such as the Client Open mes- 

Component 226 may receive a "virtual" address that 35 sage 420, preferably use a well-known destination port 

corresponds to a group of available local policy enforcers in number, such as 1022. The source destination port for the 

a manner similar to the Standby Router Protocol described flow declaration component 226 may be dynamically 

in U.S. Pat. No. 5,473,599, which is hereby incorporated by agreed-upon when the TCP session with the local policy 

reference in its entirety. A single "active" local policy enforcer 210 is first established. At the local policy enforcer 

enforcer may be elected from the group to perform the 40 210, message 420 is received at the communication engine 

functions described herein. 312 and passed up to the traffic flow state machine engine 

It should be further understood that the flow declaration 310. The traffic flow state machine engine 310 examines the 

component 226 preferably opens one TCP session with the message 420 which it recognizes as a Client Open message 

local policy enforcer 210 per application program 224 per due to the value (e.g., "7") loaded in the operation code field 

nelwork interface card (NIC). More specifically, if host/ 45 520. Local policy enforcer 210 may first determine whether 

server 222 is connected to network 200 through multiple it has adequate resources to accept a new client. For 

LANs (each with a corresponding NIC), then traffic flows example, local policy enforcer 210 may include an admis- 

from program 224 may be forwarded onto any of these sion control module (not shown) that determines the per- 

LANs. To ensure that the appropriate policy or service centage of time that its central processing unit (CPU) has 

treatments are applied regardless of which LAN initially 50 remained idle recently, its available memory (for storing 

carries the flow, flow declaration component 226 preferably policies associated with component 226) and the availability 

establishes a separate communication session with a local of its traffic management resources, such as meter 322, 

policy enforcer 210 through each LAN (i.e., through each marker 324 and shaper/dropper 326, to manage additional 

NIC) for every program 224 that requests services from traffic flows. 

component 226. 55 Assuming local policy enforcer 210 has sufficient avail- 
In particular, flow declaration component 226 directs able resources, it replies to the flow declaration component 
message generator 230 to formulate a Client Open message 226 with a Client Accept message 422. The format of the 
420 for forwarding to the local policy enforcer 210. The Client Accept message 422 is similar to the format of the 
Client Open message 420 establishes communication Client Open message 422 shown in FIG. 5A, In particular, 
between the local policy enforcer 210 and the flow decla- 60 the Client Accept message 422 also includes a header that is 
ration component 226 and may be used to determine similar to header 510 and a timer area that is similar to timer 
whether the local policy enforcer 210 has the resources to area 512. The operation code for the Client Accept message 
monitor the anticipated flow from the application program 422 (which is loaded in field 520) is another predefined 
224 and to apply the appropriate policy or service treat- value (e.g., "8") so that flow declaration component 226 will 
ments. FIG. 5A is a block diagram of the preferred format of 65 recognize this APD message as a Client Accept message, 
the Client Open message 420. In particular, the Client Open The traffic flow state machine engine 310 also loads a value 
message 420 includes at least two elements: a header 510 in the Keep Alive Timer Value field 532 which may corre- 
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spond to the value proposed by component 226 or may be a C-Type field 546 may also be set to "1". The device handle 

new value selected by the local policy enforcer 210. field 548 preferably contains a 2 octet identifier selected by 

The traffic flow state machine engine 310 hands the Client the local policy enforcer 210 during establishment of the 

Accept message 422 to its communication engine 312 which communication session. For example, the device handle may 

may encapsulate the message as required and forwards it to 5 be "1327". The flow handle field 550 preferably contains the 

the host/server 222. At the host/server 222 the message is flow handle H2 generated by the flow declaration compo- 

received at the communication facility 228 and passed up to nent 226 in response to the BeginFlow( ) call 424. 

the flow declaration component 226 where it is examined. Following the handle area 540 are a plurality of policy 

Flow declaration component 226 examines the operation bindings 552, such as policy bindings 552a, 5526 and 552c. 

code field 520 and "learns" that it is a Client Accept 10 The policy bindings 552 contain encoded versions of the 

message. Flow declaration component 226 also examines information stored in the traffic flow data structure 234 that 

the keep alive timer field 532 to determine what value has corresponds to the flow handle specified in field 550. Each 

been specified by local policy enforcer 210, which is used to policy binding 552, moreover, has two elements, a policy 

generate additional APD messages, as described below. identifier element 554 and an encoded policy instance ele- 

It should be understood that the flow declaration compo- 15 ment 556. Basically, the policy identifier element 554 iden- 

nent 226 may issue the Client Open message 420 as soon as tifies the type or instance of policy clement that is contained 

the StartUp( ) call 420 is issued if not earlier. in the associated encoded policy instance element 556. Each 

When application program 224 is ready to begin trans- policy identifier clement 554 includes a plurality of fields, 

milting the anticipated traffic flow (e.g., the IBM stock quote including a length field 558 (specifying its length), a policy 

form) to end station 212, it issues a BeginFlow( ) call 424a 20 identifier (Policy ID) type field 560 and a policy identifier 

to the flow declaration component. Preferably, the field 562. Each encoded policy instance element 556 simi- 

BeginFlow( ) call 424a is issued slightly before (e.g., 50 ms) larly includes a plurality of fields, including a length field 

program 224 begins forwarding the message to the commu- 564 (specifying its length), an encapsulation type field 566 

nication facility 228. It should be understood, however, that and an encoded policy element field 568. 

the BeginFlowf ) call 424c may be issued at the same time 25 The first policy binding 552a, for example, may contain 

as the anticipated flow to end station 212 is commenced or an encoded copy of the source port identified by program 

even slightly later. The application program 224 uses the 224 with the SetSourcePort( ) call 414a and stored at the 

previously returned handle HI as an argument to the respective traffic flow data structure 234. More specifically, 

BeginFlow( ) call 424a. If program 224 wishes to receive message generator 230 loads policy identifier field 562a 

any feedback regarding the policy or service treatments that 30 with the type or instance of the policy element (e.g., "source 

are applied to the respective traffic flow, it may also assert a port"). In the preferred embodiment, this name is a Policy 

flag argument in the BeginFlow( ) call 424a and add one or Identifier (PID) as specified in the Internet Engineering Task 

more callback functions as additional arguments. The call- Force (IETF) draft document COPS Usage for Differenti- 

back function preferably identifies an entry point in the ated Services submitted by the Network Working Group, 

application program 224 to which the requested feedback is 35 dated December 1998, and incorporated herein by reference 

to be returned. Program 224 may also load other information in its entirety. A PID specifies a particular policy class (e.g., 

or data that will simply be returned to it with the requested a type of policy data item) or policy instance (e.g., a 

feedback to assist program 224, for example, in mapping the particular instance of a given policy class) in a hierarchical 

returned feedback to a particular task. arrangement. The Policy ID type field 560a contains a 

The BegmFIow( ) call 424 is received and examined by 40 predefined value reflecting that field 562a contains informa- 

the flow declaration component 226, which, in part, deter- lion in PID format. Component 226 preferably includes a 

mines whether the feedback flag has been set. If so, it also Policy Information Base (PIB) for use in deriving the 

looks for any callback functions and information arguments particular policy identifiers, as described in COPS Usage for 

specified by program 224. Flow declaration component 226 Differentiated Services. 

may also return a flow handle, H2, to program 224 as an 45 The message generator 230 then accesses the source port 

argument to the BeginFlow() call 424. Component 226 may information from the respective traffic flow data structure 

also return an acknowledgment or error message as addi- 234 and translates it into a machine independent format 

tional arguments. Assuming that the BeginFlow( ) call 424 suitable for transmission across network 200. For example, 

did not cause any errors, flow declaration component 226 the source port information may be translated in accordance 

then directs its message generator 230 to formulate a Flow 50 with the ASN.l translation technique. The encapsulated 

Start APD message 426. version of the source port is then loaded in the encoded 

FIG. 5B is a block diagram of a preferred Flow Start policy element field 568a of binding 552a. The encapsula- 

message 426, which is similar to the Client Open message lion type field 566a contains a predefined value reflecting 

420. In particular, the Flow Start message 426 includes a that the information in field 568a has been encapsulated 

header 510 having a flags field 518 and an operation code 55 according to ASN.l. Message generator 230 similarly builds 

field 520, among others. If program 224 requested policy additional bindings 552 that contain encapsulated versions 

feedback, then message generator 230 preferably asserts the of the source IP address, transport protocol, destination port 

flag in field 518. In addition, the operation code field 520 is number and destination IP address as specified by program 

preferably loaded with the value "1" to indicated that this 224 in API calls 4146-414<? and stored at traffic flow data 

particular APD message is a Flow Start message 426. eo structure 234. Message generator 230 also formulates sepa- 

Following the header 510 is a handle area 540, which rate bindings 552 for each of the application-level data items 

includes a length field 542 (specifying the length of the established by the application program 224 through 

handle area 540), a Class Number (C-Num) field 544, a application-level API calls 416. Again, each of these 

Class Type (C-Type) field 546, a device handle field 548 and application-level data items may be identified by a corre- 

a flow handle field 550. The C-Num field 544 is loaded with 65 sponding PID which is loaded in the Policy ID type field 562 

an agreed-upon value (e.g., "1") indicating that this portion of the respective binding 552. The application-level data 

of the Flow Start message 426 contains a flow handle. The item is then translated into a machine-independent formal 
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(e.g., through ASN.l) and loaded in the respective encoded 
policy element field 568, as described above. 

It should be understood that other translation techniques, 
such as XDR, may also be used. It should be further 
understood that the contents of other fields, including policy 
identifier field 556, should be similarly translated into 
machine-independent format. 

The Flow Start message 426 is then handed down to the 
communication facility 228 for transmission to the local 
policy enforcer 210. At the local policy enforcer 210, the 
message 426 is captured by the communication engine 312 
and handed to the traffic flow state machine engine 310 
which parses the operation code field 520 to determine that 
the message is a Flow Start APD message. In response, the 
local policy enforcer 210 proceeds to obtain the particular 
policy rules or service treatments that are to be applied to 
this flow (e.g., a stock quote form for IBM). Id particular, the 
local policy enforcer 210 formulates a Request Policy mes- 
sage 428 for transmission to the policy server 216. Id the 
preferred embodiment, the format of the Request Policy 
message 428 corresponds to the Request message of the 
Common Open Policy Service (COPS) Protocol specified in 
the IETF draft document The Common Open Policy Service 
(COPS) Protocol, dated Aug. 6, 1998, and incorporated 
herein by reference in its entirety. 

According to the COPS protocol, Request messages 
include a plurality of flags, such as a request type flag and 
a message flag, and a plurality of objects. The request type 
flag for message 428 is preferably set to the COPS value that 
corresponds to "Incoming-Message/Admission Control 
Request" type COPS messages and the message type flag 
should be set to "V. Furthermore, the "In -Interface" object 
of the Request Policy message 428 is preferably set to the 
VLAN designation associated with the local policy enforc- 
er's interface at which the Flow Start message 426 was 
received. The bindings 552 of the How Start message 426, 
which may not be meaningful to the local policy enforcer 
210, are preferably loaded (i.e., copied as opaque objects) 
into the Client Specific Information (CUentSI) object portion 
of the Request Policy message 428. The local policy 
enforcer 210 also loads a unique handle that identifies the 
anticipated traffic flow from program 224 into the Request 
Policy message 428. This handle, moreover, is used in all 
messages exchanged between the local policy enforcer 210 
and the policy server 216 for this anticipated traffic flow. The 
handle may be the flow handle H2 previously returned by the 
flow declaration component 226. 

It should be understood that intermediate network 
devices, such as local policy enforcer 210, may learn of the 
identity of the policy server 216 through any conventional 
means, such as manual configuration or a device configu- 
ration protocol. 

The Request Policy message 428 is received at the policy 
server 216, which examines the network parameters speci- 
fied for the anticipated traffic flow, including the IP 
addresses, port numbers and transport protocol. The policy 
server 216 also examines the application-level parameters 
specified by program 224 and provided to the policy server 
216 in the Request Policy message 428. Based on this 
information, the policy server 216 makes a decision regard- 
ing the policy rules or service treatments to be applied to this 
traffic flow. For example, as described in co-pending U.S. 
Patent Application Ser. No. 09/179,036, which is hereby 
incorporated by reference in its entirety, the policy server 
216 may obtain information from the repository 218 and/or 
network administrator via end station 220 and, in response, 
formulate one or more traffic management rules, such as 
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classification, behavioral or configuration rules. More 
specifically, server 216 may formulate one or more classi- 
fication rules for instructing the local policy enforcer 210 to 
classify data packets and frames from this traffic flow with 

5 a given DS codepoint, IP Precedence and/or user priority. 
Policy server 216 may also formulate one or more behav- 
ioral rules that instruct the local policy enforcer 210 to map 
packets with the given DS codepoint to a particular queue 
(e.g., 330rf) and to apply a particular scheduling algorithm 

10 (e.g., WFQ). These policy decisions or rules are then loaded 
into a Policy Decision message 430 and sent from the policy 
server 216 to the local policy enforcer 210. 

Communication engine 312 captures the Policy Decision 
message 430 and forwards it to the traffic flow state machine 

15 engine 310, which, in turn, extracts the policy decisions or 
rules contained in the message 430. Traffic flow state 
machine engine 310 preferably establishes a flow state (not 
shown) for the anticipated traffic flow that includes infor- 
mation identifying the anticipated traffic flow (such as IP 

20 addresses, port numbers and transport protocol) and the 
policy decisions or rules to be applied to that traffic. Traffic 
flow state machine engine 310 may also build one or more 
data structures (such as tables) to store the mappings con- 
tained in the Policy Decision message 430. 

25 As packets or frames are received at the local policy 
enforcer 210, they are examined by the packet/frame clas- 
sifier 314. More specifically, the packet/frame classifier 314 
parses the source and destination port fields 152, 154 (FIG. 
1C) and the IP source and destination address fields 126, 128 

30 and the protocol field 124 (FIG. IB). This information is 
then supplied to the traffic flow state machine engine 310, 
which determines whether a traffic flow state has been 
established for such packets or frames. Assuming the pack- 
ets or frames correspond to the anticipated flow from the 

35 program 224 to end station 212 (e.g., the IBM stock quote 
form), a traffic flow state will exist and have associated 
policy rules or service treatments as specified in the Policy 
Decision message 430 from policy server 216. Local policy 
enforcer 210 then applies the specified treatments to these 

40 packets or frames. For example, the traffic flow state 
machine engine 310 may instruct the packet/frame classifier, 
to set the DS field 132 (FIG. IB) of such packets or frames 
to a value associated with best effort traffic. Similarly, the 
traffic flow state machine engine 310 may instruct the queue 

45 selector/mapping entity 318 to place these packets or frames 
in a particular (e.g., moderate priority) queue. Alternatively 
or in addition, packet/frame classifier may be instructed to 
load the ToS field 122 (FIG. IB) or the user_4)riority field 
108 (FIG. 1A) with predetermined values so as to implement 

50 these treatments at other intermediate network devices, such 
as device 208. 

To the extent the application program 224 requested 
feedback as to the policy or service treatments applied to this 
traffic flow, the local policy enforcer 210 may formulate and 

55 send one or more Decision Feedback APD messages 432 to 
the flow declaration component 226. The Decision Feedback 
message 432 is similar in formal to the Flow Start message 
426. In particular, the Decision Feedback message 432 has 
a header 510 and a handle area 540. For Decision Feedback 

so messages 432, the operation code field 520 is preferably 
loaded with the value "3". Appended to the handle area 540 
are one or more decision bindings (not shown) that are 
similar in format to the policy bindings 552. In particular, 
each decision binding contains a treatment specified by the 

65 policy server 216 and applied by the local policy enforcer 
210. For example, a first decision binding may provide that 
the specified traffic flow is being marked with a particular 
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DS codepoint. Other decision bindings may specify the IP 
Precedence or user_priority values being entered in fields 
122, 108, respectively, of this traffic flow. Other decision 
bindings may be more abstract and describe abstract service 
classes granted to the traffic flow. The Decision Request 
message 432 is received at the communication facility 228 
and passed up to the flow declaration component 226. The 
flow declaration component 228 extracts the particular treat- 
ments from the decision bindings and returns them to the 
application program 224 through a callback function 434 
specified by the application program 224 in the 
BegtoFlow( ) call 424. 

In order to maintain the TCP session established between 
the flow declaration component 226 and the local policy 
enforcer 210, the flow declaration component 226 may send 
one or more Keep Alive APD messages 436. The Keep Alive 
message 436 simply includes a header 510 with the opera- 
tion code field set to "9" and the message length field 524 set 
to "0". Flow declaration component 226 preferably sends at 
least one Keep Alive message 436 within every time period 
specified in the keep alive timer value field 532 of the Client 
Accept message 422. 

It should be understood that the policy server 216 may 
unilaterally send a Decision Change message 438 to the 
local policy enforcer 210 if a change in the previously 
supplied policy rules or service treatments occurs after the 
Policy Decision message 430 was sent. For example, the 
policy server 216 may obtain up-dated information from the 
repository 218 or from the network administrator through 
end station 220. This up-dated information may affect the 
policy rules or service treatments previously supplied to the 
local policy enforcer 210. In response, the policy server 216 
preferably formulates and sends the Decision Change mes- 
sage 438. The format of the Decisions Change message 438 
is preferably the same as the Policy Decision message 430. 
The Decision Change message 438 is similarly captured at 
the communication engine 312 of the local policy enforcer 
210 and forwarded to the traffic flow state machine engine 
310. 

To the extent the Decision Change message 438 includes 
new policy rules or service treatments, the traffic flow state 
machine 310 preferably up-dates its traffic flow state accord- 
ingly. In addition, the traffic flow slate machine 310 applies 
the up-dated policy rules or service treatments to subse- 
quently received packets or frames that correspond to the 
traffic flow. The local policy enforcer 210 may also generate 
and send a Decision Feedback message (like message 432) 
to component 226 if feedback was requested by program 
224. 

The policy server 216 may also transmit one or more 
Decision messages to other intermediate network devices, 
such as device 208, that are along the path of the anticipated 
traffic flow from host/server 222 to end station 212. These 
Decision messages similarly inform the intermediate net- 
work devices as to what policy rules or service treatments to 
apply to the traffic flow from program 224, which presum- 
ably has already been classified by the local policy enforcer 
210. Policy server 216 is thus able to provide end-to-end 
quality of service support. 

It should be understood that the local policy enforcer 210 
and the policy server 216 may exchange additional COPS 
messages as required, such as COPS Client Open and COPS 
Client Accept messages among others. 

The local policy enforcer 210 may also send one or more 
Keep Alive APD messages 440 to the flow declaration 
component 226 at the host/server 222. The Keep Alive 
message 440 from the local policy enforcer 210 preferably 
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has the same format as Keep Alive message 436 from 
component 226. 

It should be further understood that the application pro- 
gram 224 may change certain characteristics associated with 

5 the traffic flow if the nature of the flow changes over time. 
For example, after reviewing the quote for IBM stock, the 
user at end station 212 may decide to place a "buy" order for 
IBM stock. In response, application program 224 may 
transmit a stock transaction form. Furthermore, the policies 

10 or service treatments to be applied to the traffic flow corre- 
sponding to the stock quote form may be very different from 
the treatments that should be applied to the traffic flow 
corresponding to the stock transaction form. Accordingly, 
the program 224 may issue one or more new application- 

15 level Set API calls 442. For example, the program may issue 
a Setlnteger( ) call 442a, a SetString( ) call 442i>, a SetFloat( 
) call 442c and/or a Set( ) call 442d. These calls arc generally 
the same as the previously described application-level Set 
API calls 416 and, although the program 224 utilizes the 

20 previously returned handle HI as an argument, it enters new 
or updated information (e.g., stock transaction versus stock 
quote forms). In response, the flow declaration component 
226 overwrites the corresponding entries in the respective 
traffic flow data structure 234 with the new or up-dated 

25 information. 

The application program 224 then issues a 
BeginUpdatedFlow( ) call 444 at or about the time that it 
begins forwarding the stock transaction form to the user at 
end station 212. The BegjnUpdatedFlow( ) call 444 is 

30 preferably the same as the BeginFlow call 424 described 
above. In response, the flow declaration component 226 
directs the message generator 230 to generate and send a 
Flow Update APD message 446 to the local policy enforcer 
210. The Flow Update message 446 is similar to the Flow 

35 Start message 424 and also includes one or more bindings 
generated from the information stored in the respective 
traffic flow data structure 234. Since the information con- 
tained in the traffic flow data structure 234 has been up-dated 
(through the issuance of the Set API calls 442), the bindings 

40 will be different from the bindings appended to the original 
Flow Start message 426. 

At the local policy enforcer 210, the Flow Update mes- 
sage 446 is examined and a Request Policy Update message 
428 is preferably formulated and sent to the policy server 

45 216. The Request Policy Update message 428 has the same 
general format as the original COPS Request Policy mes- 
sage 448, although it includes the new bindings generated as 
a result of the Set API calls 442. The policy server 216 
examines the Request Policy Update message 448 and, in 

50 response, obtains the appropriate policy rules or service 
treatments for this up-dated traffic flow. The policy server 
216 then loads these up-dated policy rules or service treat- 
ments in a Policy Decision Update message 450, which is 
sent to the local policy enforcer 210. Since at least some of 

55 the traffic characteristics have changed, the policies or 
treatments contained in the Policy Decision Update message 
450 may be different than the treatments previously pro- 
vided in the Policy Decision 430. For example, the up-dated 
policies may provide that this traffic flow is to be classified 

60 as high priority and granted excellent effort treatment. 
Similarly, the up-dated policies may provide that the DS 
field 132 of packets or frames from this traffic flow should 
be loaded with a DS codepoint associated with expedited 
forwarding. 

65 The Policy Decision Update message 450 is received at 
the local policy enforcer 210 which modifies the correspond- 
ing traffic flow state with the up-dated policies. The local 
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policy enforcer 210 also applies these up-dated policies to 
any subsequently received packets or frames from the host/ 
server 222 that satisfy the previously identified network and 
transport layer parameters (e.g., IP addresses, port numbers 
and transport protocol). Local policy enforcer 210 may also 
provide feedback to component 226 as described above. 

When the traffic flow between the application program 
224 and end station 212 is finished, program 224 preferably 
issues a ReleaseFlow( ) call 452 to the flow declaration 
component 226 using the previously returned flow handle 
H2 as an argument. Flow declaration component 226 may 
return an acknowledgment or an error message to the 
program 224. In response, the flow declaration component 
226 directs message generator 230 to formulate a Flow End 
APD message 454. The format of the Flow End message 454 
is preferably the same as the Flow Start message 426, 
although the operation code field 520 is preferably loaded 
with "2" to signify that it is a Flow End message. Although 
the flow declaration component 226 forwards the Flow End 
message 454 to the local policy enforcer 210, it preferably 
does not discard the traffic flow data structure 234. 

In response, the local policy enforcer 210 formulates a 
COPS Request message 546 to inform the policy server 216 
that the respective traffic flow is finished. The policy server 
216 may reply with a Decision message 458 authorizing the 
local policy enforcer 210 to erase the traffic flow state which 
was established for this particular flow. If the application 
program 224 subsequently initiates another traffic flow with 
the same end station 212, it may re-use the information 
stored in the traffic flow data structure 234 by issuing 
another BeginFlow( ) call 424 utilizing the previously 
returned bindings handle HI. The flow declaration compo- 
nent 226, in response, proceeds as described above by 
sending a Flow Start message 426 to the local policy 
enforcer 210. 

The application program 224 may also issue a 
DestroyBindings( ) call 460 to the flow declaration compo- 
nent 226 whenever it concludes that the bindings are no 
longer needed. Program 224 preferably utilizes the previ- 
ously returned bindings handle HI as an argument to the 
DestroyBindings( ) call 460. In response, component 226 
preferably discards the contents of the traffic flow data 
structure 234 that corresponds to bindings handle HI. 

When the application program 224 is closed it should 
shutdown all outstanding traffic flow services by issuing 
corresponding ReleaseFlow( ) calls 452 and it should also 
destroy all bindings that it created by issuing 
DestroyBindings( ) calls 460. In response, component 226 
directs message generator 230 to formulate a Client Close 
APD message 462. The Client Close message 462 is simply 
a header 510 with the operation code field 520 loaded with 
the value "10". In response, the local policy enforcer 210 
formulates and sends a COPS Request message 464 to the 
policy server 216 indicating that the program 224 is closed. 
The policy server 216 may reply with a COPS Decision 
message 466 instructing the local policy enforcer 210 to 
release all of the corresponding traffic flow slates that were 
previously established for the application program 224. 

One skilled in the art will recognize that two or more of 
the previously described API system calls may be combined 
into a single call or that any one call may be broken down 
into multiple calls. One skilled in the art will also recognize 
that the particular names of the API system calls is unim- 
portant. Thus, it is an object of the present invention to cover 
the foregoing communicating relation between the applica- 
tion program 224 and the flow declaration component 226, 
regardless of the particular implementation ultimately cho- 
sen. 
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It should also be understood that any set of values may be 
inserted in the operation code field 520 of the APD messages 
provided that each APD message type (e.g., Client Open, 
Client Accept, Flow Start, etc.) has a different value assigned 

5 to it. Furthermore, if a local policy enforcer is unable to 
handle a particular application program or traffic flow (e.g., 
insufficient memory or other resources), it preferably 
responds to the Client Open message with a Client Close 
message, rather than a Client Accept message. 

In the preferred embodiment, the flow declaration com- 
ponent 226 is implemented in software as a series of steps 
executed at the host/server 222. Nonetheless, it should be 
understood that the method may be implemented, either 
wholly or in part, through one or more computer hardware 
devices. Additionally, the present invention is preferably 

15 utilized only with traffic flows of sufficient length (e.g., 
greater than 5-10 packets). The application program 224 
may be configured not to request bindings or issue API calls 
for short traffic flows. 
It should be understood that some or all of the above 

20 described functionality of the local policy enforcer 210 may 
be located at the host/server 222. For example, the host/ 
server 222 may include a traffic flow state machine engine 
310 that is capable of sending and receiving COPS Request 
and Decision messages directly to and from the policy server 

25 216. In this case, the Client Open, Row Start and Flow 
Update messages are simply inter-process communications 
within the host/server 222, rather than being forwarded 
across the network. The operating system at the host/server 
222 may also include one or more resources that may be 

30 utilized to provide traffic management services, such as 
classifying packets and frames (e.g., loading the DS field 
132, ToS field 122 and/or user_priority field 108), sched- 
uling packet and frame forwarding from different priority 
queues, etc, 

35 It should be further understood that the local policy 
enforcer 210 may make policy or service treatment decisions 
for traffic flows identified by the flow declaration component 
226 without querying the policy server 216. That is, the local 
policy enforcer 210 may cache certain policy rules or 

40 treatments. 

In another aspect of the invention, the application pro- 
gram 224 may request policy decisions in advance of issuing 
the BeginFlowf ) call 424. For example, program 224 may 
only have a small number of application-level parameter 

45 bindings. After creating the bindings (using only the 
application-level parameters) as described above, the pro- 
gram 224 may issue a GetFlowDecision( ) system call to 
component 226 and, in return, receive a handle, H3. Com- 
ponent 226 issues an Obtain Decision APD message to the 

50 local policy enforcer 210 for each binding, including the 
specified application-level parameters. The local policy 
enforcer 210 will obtain the appropriate policy rules or 
service treatments to be applied to these, as yet un-specified, 
"flows" as described above. 

55 When program 224 is about to begin a flow corresponding 
to one of these bindings, it may issue a BeginFlow( ) call, 
including the network and transport layer parameters for the 
traffic flow and the handle H3 for the corresponding 
application-level bindings. Component 226 then forwards 

60 this information in a Flow Start message 426 to the local 
policy enforcer 210 as described above. Since the local 
policy enforcer 210 has already obtained the policy or 
service treatments to be applied to this flow, it need not 
query the policy server 216. Instead, the local policy 

65 enforcer 210 simply monitors the traffic from host/server 
222 and, when it identifies the specified traffic flow, applies 
the previously received policy rules or service treatments. 
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Enhanced RSVP Messaging 

In a further aspect of the invention, the flow declaration 
component 226 may be configured to exchange one or more 
modified Resource reSerVation Protocol (RSVP) messages 
with the local policy enforcer 210 in place of the APD 
messages described above. RSVP is a well-known Iaternet 
Control protocol for reserving resources, typically 
bandwidth, between a sender entity and a receiver entity. 
RSVP is defined at Request for Comments (RFC) 2205, 
September 1997, from the Network Working Group of the 
IETF, aod is hereby incorporated by reference in its entirety. 
The protocol defines two fundamental message types: RSVP 
path messages (Path) and reservation request messages 
(Resv). Basically, senders transmit Path message down- 
stream throughout the network to potential receivers offering 
to supply a given message stream. Receivers, wishing to 
obtain the proposed message stream, transmit Resv mes- 
sages that are propagated upstream all the way back to the 
sender. At each intermediate node in the network, bandwidth 
resources are reserved to ensure that the receiver will obtain 
the message stream. 

In this embodiment of the present invention, component 
226, rather than generating and forwarding the Flow Start 
APD message 426 in response to the BeginFlow( ) call 424, 
formulates and sends a modified RSVP Path message to the 
local policy enforcer 210. FIG. 6 is a block diagram illus- 
trating the preferred format of a modified RSVP Path 
message 610. Modified Path message 610 carries the net- 
work and"transport layer parameters and application-level 
parameters specified for the anticipated traffic flow. In 
particular, message 610 preferably includes at least three 
elements: an RSVP header 612, a first area 614 (which 
carries the network and transport layer parameters) and at 
least one RSVP Policy_Data object 616 (which carries the 
application-level parameters). As provided in RFC 2205, the 
RSVP header includes a version field 618, a flags field 620, 
a message type field 622, an RSVP checksum field 624, a 
Send Time To Live (TTL) field 626, a reserved field 628 and 
an RSVP length field 630. 

Component 226 preferably loads version field 618, which 
corresponds to the version of RSVP, with the appropriate 
value (e.g., "1"). Flags field 620 is preferably de-asserted as 
no flags are presently defined. Message type field 622, which 
indicates the type of message (e.g., "1" for RSVP Path 
messages and "2" for RSVP Resv messages) is preferably 
loaded with the value "1" to indicate that message 610 is a 
Path message. It should be understood that field 622 may 
alternatively be loaded with a new value to indicate that 
message 610 is a modified RSVP Path message. The RSVP 
Checksum field 624 may be loaded with a computed check- 
sum for message 610. The Send_TTL_field 626 is prefer- 
ably loaded with an IP time to live value, and the RSVP 
length field 630 preferably contains the length of message 
610. 

The first area 614 preferably includes an RSVP sender 
template object 632 and an RSVP session object 634, each 
having a plurality of fields. More specifically, the sender 
template and session objects 632, 634 each have a length 
field 638 (loaded with the length of the respective object), a 
class number field (C-Num) 634 and a class type (C-type) 
field 642. For the sender template object 632, which further 
includes an IP source address (SA) field 644, a source port 
number field 646 and may include one or more un-used 
fields 648, the respective C-Num field 640 is preferably 
loaded with "11" to signify that it is an RSVP sender 
template object and the respective C-Type field 642 may be 
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loaded with "1" to indicate that fields 644 and 646 carry the 
IPv4 address and the TCP/UDP port number, respectively, at 
host/server 222 for the anticipated traffic flow. For the 
session object 634, which further includes an IP destination 
5 address (DA) field 650, a transport protocol field 652, a flags 
field 654 and a destination port number field 656, the 
respective C-Num field 640 is loaded with "1" to signify that 
it is an RSVP session object and the respective C-Typc field 
642 may be loaded with "1" to indicate that fields 650 and 
656 carry the IPv4 address and the TCP/UDP port number, 
respectively, for the corresponding process at end station 
212 for the anticipated traffic flow. Component 226 may 
assert flags field 654 if it is capable of policing its own traffic 
flows. 

One skilled in the art will recognize that first area 614 of 
modified RSVP Path message 610 may be modified in any 
number of ways, including fewer or additional fields or to 
carry IPv6 information. 

The RSVP Policy__Data object 616 also has a length field 
638, a C-Num field 640 and a C-Type 642 field. In addition 
RSVP Policy_Data object 616 includes a policy_data 
object field 658. The respective length field 638 carries the 
length of object 616 and the respective C-Num field is 
loaded with "14" to indicate that field 658 is a policy_data 

^ object field. The C-Type field 642 of object 616 is preferably 
loaded with a new value (e.g., "2") to signify that policy_ 
data object field 658 carries application-level parameters. 
Furthermore, policy_data object field 658 is loaded by 
component 226 with the application-level bindings specified 

30 by program 224 preferably in the manner as described above 
with reference to FIG. 5B. 

One skilled in the art will also recognize that the 
application-level parameters may be carried in multiple 
RSVP Policy„Data objects 616. 

35 This modified RSVP path message 610 is preferably 
handed to the communication facility 228 for forwarding to 
the local policy enforcer 210 where it is examined. In 
response, the local policy enforcer 210 and the policy server 
216 exchange Request Policy 428 and Policy Decision 430 

40 messages, as described above, in order to obtain the policy 
rules or service treatments to be applied to the traffic flow 
identified in the modified RSVP Path message 610. Local 
policy enforcer 210 also extracts and stores the network and 
transport layer parameters from the RSVP Sender Template 

45 object 614 in order to identify the particular traffic flow from 
host/server 222. 

The local policy enforcer 210 may also reply to compo- 
nent 226 with a modified RSVP Resv message rather than 
the Decision Feedback message 432. This modified RSVP 

50 Resv message preferably includes a header similar to header 
612, but with the message type field 622 loaded with the 
value "2" to indicate that it is ao RSVP Resv messages or 
with a new value to indicate that it is a modified RSVP Resv 
message. The modified RSVP Resv message also includes 

55 one or more RSVP Policy_Data objects similar to object 
616. In this case, however, object 616 carries the decision 
bindings for the anticipated traffic flow as described above. 
Component 226 may extract these decision bindings in order 
to provide feedback to application 224. 

60 As shown, component 226 utilizes a modified RSVP path 
message 610 to identify network and transport layer param- 
eters and application- level parameters to the local policy 
enforcer 210. The modified RSVP Path message 610, 
moreover, is preferably not forwarded by the local policy 

65 enforcer 210, unlike conventional RSVP Path and Resv 
messages which are propagated all the way between the 
sender and receiver entities. 
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It should be understood that the local policy enforcer 210 for the traffic flow, fills the traffic flow data structure with the 

is preferably in close proximity to host/server 222 so that the specified one or more network and transport layer param- 

classification of packets or frames from the anticipated etcrs. 

traffic flow occurs early in their journey through the network 5. The network entity of claim 4 wherein the flow 

200. It should also be understood that the traffic flow from 5 declaration component, in response to one or more API 

end station 212 to host/server 222 may similarly be identi- system calls from the at least one application program 

fied and appropriate policy rules or service treatments specifying a start of the traffic flow, generates a message 

applied thereto. It should be further understood that the flow having the specified one or more network and transport layer 

declaration component 226 is configured to handle and parameters for transmission to the local policy enforcer. 

separately identify multiple traffic flows from multiple appli- 10 6 ^ network entity of claim 3 wherein the flow 

cation programs executing at the host/server 222 so that the declaration component, in response to one or more API 

appropriate policy rules or service treatments may be indi- svstem calls from the al { ™\?™ appbcation program that 

vidually applied to each such traffic flow through the net- f pe ^ lfy fl 0ne °' m ° re f^T ?! ■£ 2? 

work 200 For example, program 224 may be simulta- triffic fl mis thc ^ T ^ 

i ■ *• . ,l ha specified one or more application-level parameters, 

neously sending a print transaction! the print server 214. 15 ^ ^ m ^ Qxk ^ q[ ^ 6 the flow 

The foregoing description has been directed to specific declaration component, in response to one or more API 

embodiments of the invention. It will be apparent, however, system calls from the at least one application program 

that other variations and modifications may be made to the specifying a start of the traffic flow, generates a message 

described embodiments, with the attainment of some or all having the specified one or more application-level param- 

of their advantages. For example, other client-server com- 20 eters for transmission to thc local policy enforcer, 

munications protocols, besides COPS, may be utilized by 8, The network entity of claim 4 wherein the flow 

the policy server and the local policy enforcer. In addition, declaration component, in response to one or more API 

the present invention may also be utilized with other net- system calls from the at least one application program that 

work layer protocols, such as IPv6, whose addresses are 128 specify one or more application-level parameters for the 

bits long. Therefore, it is the object of the appended claims 25 traffic flow ' ^ thc flow data structure with thc 

to cover all such variations and modifications as come specified one or more application-level parameters, 

within the true spirit and scope of the invention. 9 - ^ oetwork efltit y of claim 8 wherein the flow 

What is claimed is- declaration component, in response to one or more API 

1. A network entity configured to communicate with a svstem caUs from me at least one application program 
local policy enforcer through a computer network, the 30 specifying a start of the traffic flow, generates a first message 
network entity having at least one application program having the specified one or more network and transport layer 
executing thereon for generating a traffic flow for transmis- parameters and the specified one or more application-level 
sion to a second network entity through the network, the parameters for transmission to the local policy enforcer, 
network entity comprising: 10 - ^ network entity of claim 9 wherein the first 

a flow declaration component in communicating relation 35 m ^f on T il,aeA ^l^ B l c & mtDt i* ^™ 

with the at least one application program for receiving reSerVation Protocol (RSVP) Path message that includes: 

one or more network and transport layer parameters a first area «>nyn»8 the specified one or more network and 

and one or more application-level parameters identify- transport layer parameters; and 

ing the traffic flow, the flow declaration component one or more policy data objects carrying the specified one 

comprising: 40 or morc application-level parameters, 

a memory for storing a traffic flow data structure U- The network entity of claim 10 wherein thc first area 

corresponding to the traffic flow, the traffic flow data of the Resource reSerVation Protocol (RSVP) Path message 

structure storing the one or more network and trans- includes a session object and a sender template object, 

port layer and one or more application -level param- l2 - A network entity configured to communicate with a 

eters identified by the at least one application 45 policy server through a computer network, the network 

program, and entity having at least one application program executing 

a message generator for formulating and transmitting thereon for generating a traffic flow for transmission to a 

one or more messages to the local policy enforcer, at second network entity through the network, the network 

least one message including information from the entity comprising: 

traffic flow data structure, 50 a flow declaration component in communicating relation 

whereby, in response to the at least one message from the with the at least one application program for receiving 

flow declaration component, a respective service treatment one or more network and transport layer parameters 

is declared for the traffic flow from the at least one appli- and one or more application- level parameters identify- 

cation program. ing the traffic flow, the flow declaration component 

2. The network entity of claim 1 wherein the at least one 55 having a memory for storing a traffic flow data structure 
application program communicates with the flow declara- corresponding to the traffic flow, the traffic flow data 
tion component through one or more Application Program- structure storing the one or morc network and transport 
ming Interface (API) system calls. layer and the one or morc application-level parameters 

3. The network entity of claim 2 wherein the flow identified by thc at least one application program; and 
declaration component, in response to one or more API 60 a traffic flow state machine engine in communication with 
system calls from the at least one application program, the flow declaration component, the traffic flow state 
associates the traffic flow data structure with the at least one machine engine including a communication engine for 
application program. formulating and transmitting one or more messages to 

4. The network entity of claim 3 wherein the flow the policy server carrying information from the traffic 
declaration component, in response to one or more API 65 flow data structure and for receiving one or more policy 
system calls from the at least one application program that decision rules from thc policy server to be applied to 
specify one or more network and transport layer parameters the traffic flow. 
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13. The network entity of claim 12 wherein the flow 20. A computer readable medium containing executable 
declaration component, in response to one or more Appli* program instructions for declaring a service treatment for a 
cation Programming Interface (API) system calls from the at plurality of network messages issued by an application 
least one application program that specify one or more program running on a network entity connected to a com- 
network and transport layer parameters and one or more 5 pu t C r network, the network messages corresponding to a 
application-level parameters for the traffic flow, fills the specific traffic flow, the computer network including a policy 
traffic flow data structure with the specified one or more cn forcer and a policy server and defining transport and 
network and transport layer parameters and the one or more network communication layers, the executable program 
application-level parameters. instructions comprising program instructions for: 

14. The network entity of claim 13 wherein the commit- 

nication engine at the traffic flow state machine engine, in 10 "*emng from the application program a plurality of 

response to one or more API system calls from the at least network and transport layer parameters corresponding 

one application program specifying a start of the traffic flow, 10 tne traffic flow; 

generates a first message having the specified one or more receiving from the application program one or more 

network and transport layer parameters and the one or more application-level parameters corresponding to the traf- 

specified application-level parameters for transmission to 15 fic flow; 

the policy server. loading the received network and transport layer param- 

15. In a computer network having a first network entity, elers md tbe application-level parameters into one or 
a local policy enforcer, a policy server and a second network more fl ow start meS sages; and 

entity, the first network entity having a flow declaration scndin the 0DC Qr more flow sUr( mcssages to thc 

component and at least one application program that are in enf ifl ^ tQ mc QDC Qr mQK flow 

communicating relation, theat least one application program M m a 5crvi(x ttCiLtmcnl ^ obtaincd for ^ 

configured to generate a traffic flow for transmission through Ucd tQ ^ ^ flow from ^ Ucation m . 

the network to the second network entity a method for n ^ ^ rcadabk medium of c]aim 2Q 

obtaining and applying policy rules to the traffic flow program mstrU ctions for receiving a notification 

comprising the steps of: &om thc appIicalion progr am indicating that thc program is 

specifying one or more network and transport layer ready to begin scnd j ng mc Qct work messages of the traffic 

parameters for the traffic flow to the flow declaration g ow 

component; 22. The computer readable medium of claim 21 further 

specifying one or more application-level parameters that M comprising program instructions for: 

describes an aspect of the traffic flow to the flow rece iving a notification from the application program 

declaration component; indicating that the program has completed its sending 

forwarding at least one message carrying the specified of messages corresponding to the traffic flow, and 

network and transport layer parameters and the speci- ^ ft flow emj m to ^ enforcer gi al . 

fled application-level parameters from the flow decla- 35 . lhg end of the tfaffic flow 

ration component to the local policy enforcer; 2 3. ^ computer feadaMe medium of c ^ m ^ wheR;in 
at the local policy enforcer, requesting a policy rule the app ij ca ti on -level parameters specify one or more of the 
decision for application to the traffic flow from the following characteristics: the size of a file being transmitted, 
policy server based on the specified network and trans- a video x&amt nam6) a video viewe[ . ( a ^ name) 
port layer parameters and the specified application- ^ a ^ department, an application module identifier, a trans- 
level parameters; and action type, a transaction name, an application state, a 
at thc local policy enforcer, applying the policy rule calling party, a called party, a compression method, a service 
decision to the traffic flow as it moves through tbe i eve i ( a uniform resource locator (URL) and a mime type, 
network. 24. The computer readable medium of claim 23 further 

16. The method of claim 15 wherein thc at least one 45 comprising program instructions for loading the received 
application program specifies the one or more network and network and transport layer parameters and the received 
transport layer parameters and thc one or more application- application level-parameters into a traffic flow data structure 
level parameters to thc flow declaration component through associated with the application program. 

one or more Application Programming Interface (API) sys- 25. The computer readable medium of claim 20 wherein 

tern calls. 50 the one or more flow start messages contain ooe or more 

17. The method of claim 16 further wherein the traffic policy bindings, the policy bindings representing encoded 
flow has a start and the method further comprises the step of versions of the network and transport layer parameters 
notifying the flow declaration component of the start of the received from the application program. 

traffic flow and further wherein the flow declaration 26. The computer readable medium of claim 25 wherein 

component, in response, forwards the at least one message 55 the policy bindings further represent encoded versions of the 

to the local policy enforcer. application-level parameters received from the application 

18. The method of claim 17 wherein the at least one program. 

message forwarded by the flow declaration component is a 27. The computer readable medium of claim 26 wherein 

Resource reSerVation Protocol (RSVP) Path message that each policy binding includes a policy identifier (PID) ele- 

includes: 50 ment and an encoded policy instance element 

a first area carrying thc specified one or more network and 28. The computer readable medium of claim 27 wherein 

transport layer parameters; and me PID is used to specify a type of class of the network and 

one or more policy data objects carrying the specified one transport layer parameters and/or the application-level 

or more application-level parameters. parameters. 

19. The method of claim 18 wherein the first area of the 65 29. The computer readable medium of claim 27 wherein 
Resource reSerVation Protocol (RSVP) Path message the PID elements comply with thc COPS Usage for Differ- 
includes a session object and a sender template object. entiated Services specification standard. 
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30. The computer readable medium of claim 26 further 
comprising program instructions for translating the 
application-level parameters into a machine independent 
format. 

31. The computer readable medium of claim 30 wherein 
the machine independent format is Abstract Syntax Notation 
One (ASN.l). 

32. The computer readable medium of claim 20 further 
comprising programming instructions for providing the ser- 
vice treatments to the application program through a call- 
back function. 

33. The computer readable medium of claim 20 further 
comprising programming instructions for sending one or 
more client open messages to the policy enforcer in order to 
open a communication session with the policy enforcer, the 
client open messages carrying a keep alive timer value. 

34. The computer readable medium of claim 33 further 
comprising programming instructions for receiving one or 
more client accept messages from the policy enforcer, the 
Client Accept messages carrying a keep alive timer value. 

35. The computer readable medium of claim 34 further 
comprising programming instructions for issuing one or 
more keep alive messages to the policy enforcer while the 
application program continues to send network messages 
corresponding to the traffic flow, the keep alive message sent 
substantially in accordance with the keep alive timer valve 
received from the policy enforcer. 

36. The computer readable medium of claim 33 wherein 
a separate communication session is opened with the policy 
enforcer for each application program for which a service 
treatment is to be declared. 

37. The computer readable medium of claim 21 further 
comprising program instructions for receiving from the 
application program a change in the application-level 
parameters for the network messages corresponding to the 
traffic flow. 

38. The computer readable medium of claim 37 further 
comprising program instructions for: 

receiving a notification from the application program 
indicating that the program is ready to begin sending 
the network messages corresponding to the changed 
application-level parameters; and 

issuing one or more flow update messages to the policy 
enforcer, the flow update messages containing the 
changed application-level parameters. 

39. The computer readable medium of claim 38 wherein 
a new service treatment is obtained for and applied to the 
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network messages corresponding to the changed 
application-level parameters from the application program. 

40. The computer readable medium of claim 24 wherein 
the service treatment is obtained in response to the policy 

5 enforcer sending one or more request policy messages to the 
policy server. 

41. The computer readable medium of claim 40 wherein 
the request policy messages comply in substantial part with 
the Common Open Policy Service (COPS) Protocol. 

1Q 42. The computer readable medium of claim 41 wherein 
the policy server, in response to the request policy messages, 
issues one or more policy decision messages to the policy 
enforcer, the policy decision messages containing the ser- 
vice treatment for the traffic flow from the application 
program. 

15 43. The computer readable medium of claim 42 wherein 
the policy enforcer establishes a flow stale for the traffic flow 
from the application program, the flow state including the 
declared network and transport layer parameters and the 
service treatment returned by the policy server. 
20 44. The computer readable medium of claim 43 wherein 
the policy enforcer 
compares messages originated by the application program 
with the declared network and transport layer 
parameters, and 
25 applies the service treatment to messages matching the 
network and transport layer parameters. 

45. The computer readable medium of claim 44 where in 
the policy enforcer, in applying the service treatment, per- 
forms one or more of: 

30 setting a Differentiated Services (DS) codepoint field of 
matching network messages from the application 
program, 

setting a Type of Service (ToS) field of matching network 
messages from the application program, and 
35 setting a user__priority field of matching messages from 
the application program. 

46. The computer readable medium of claim 43 further 
comprising program instructions for discarding the contents 
of the traffic flow data structure in response to receiving the 

40 notification from the application program that the program 
has completed its sending of messages. 

47. The computer readable medium of claim 46 wherein 
the policy enforcer, in response to the flow end message, 
erases the traffic flow state established for the traffic flow 
from the application program. 

***** 



01/21/2004, EAST Version: 1.4.1 



